Generate Fake Data from Real Dataset Structure
Usage
generate_fake_data(
data,
n = 30,
category_mode = c("preserve", "generic", "custom"),
numeric_mode = c("range", "distribution"),
column_mode = c("keep", "generic", "custom"),
custom_levels = NULL,
custom_names = NULL,
seed = NULL,
verbose = FALSE,
sensitive = NULL,
sensitive_detect = TRUE,
sensitive_strategy = c("fake", "drop"),
normalize = TRUE
)Arguments
- data
A tabular object; will be coerced via
prepare_input_data().- n
Rows to generate (default 30).
- category_mode
One of "preserve","generic","custom".
preserve: sample observed categories by empirical frequency (keeps factors)
generic: replace categories with "Category A/B/..."
custom: use
custom_levels[[colname]]if provided
- numeric_mode
One of "range","distribution".
range: uniform between min/max (integers stay integer-like)
distribution: sample observed values with replacement
- column_mode
One of "keep","generic","custom".
keep: keep original column names
var1..varP(mapping inattr(name_map))custom: use
custom_namesnamed vector (old -> new)
- custom_levels
optional named list of allowed levels per column (for
- custom_names
optional named character vector old->new (for
column_mode="custom").- seed
Optional RNG seed.
- verbose
Logical; print progress.
- sensitive
Optional character vector of original column names to treat as sensitive.
- sensitive_detect
Logical; auto-detect common sensitive columns by name.
- sensitive_strategy
One of "fake","drop". Only applied if any sensitive columns exist.
- normalize
Logical; lightly normalize inputs (trim, %→numeric, short date-times→POSIXct).
