Skip to contents

Generate Fake Data from Real Dataset Structure

Usage

generate_fake_data(
  data,
  n = 30,
  category_mode = c("preserve", "generic", "custom"),
  numeric_mode = c("range", "distribution"),
  column_mode = c("keep", "generic", "custom"),
  custom_levels = NULL,
  custom_names = NULL,
  seed = NULL,
  verbose = FALSE,
  sensitive = NULL,
  sensitive_detect = TRUE,
  sensitive_strategy = c("fake", "drop"),
  normalize = TRUE
)

Arguments

data

A tabular object; will be coerced via prepare_input_data().

n

Rows to generate (default 30).

category_mode

One of "preserve","generic","custom".

  • preserve: sample observed categories by empirical frequency (keeps factors)

  • generic: replace categories with "Category A/B/..."

  • custom: use custom_levels[[colname]] if provided

numeric_mode

One of "range","distribution".

  • range: uniform between min/max (integers stay integer-like)

  • distribution: sample observed values with replacement

column_mode

One of "keep","generic","custom".

  • keep: keep original column names var1..varP (mapping in attr(name_map))

  • custom: use custom_names named vector (old -> new)

custom_levels

optional named list of allowed levels per column (for

custom_names

optional named character vector old->new (for column_mode="custom").

seed

Optional RNG seed.

verbose

Logical; print progress.

sensitive

Optional character vector of original column names to treat as sensitive.

sensitive_detect

Logical; auto-detect common sensitive columns by name.

sensitive_strategy

One of "fake","drop". Only applied if any sensitive columns exist.

normalize

Logical; lightly normalize inputs (trim, %→numeric, short date-times→POSIXct).

Value

A data.frame of n rows with attributes:

  • name_map (named chr: original -> output)

  • column_mode (chr)

  • sensitive_columns (chr; original names)

  • dropped_columns (chr; original names that were dropped)