Generate Fake Data from Real Dataset Structure
Usage
generate_fake_data(
data,
n = 30,
category_mode = c("preserve", "generic", "custom"),
numeric_mode = c("range", "distribution"),
column_mode = c("keep", "generic", "custom"),
custom_levels = NULL,
custom_names = NULL,
seed = NULL,
verbose = FALSE,
sensitive = NULL,
sensitive_detect = TRUE,
sensitive_strategy = c("fake", "drop"),
normalize = TRUE
)
Arguments
- data
A tabular object; will be coerced via
prepare_input_data()
.- n
Rows to generate (default 30).
- category_mode
One of "preserve","generic","custom".
preserve: sample observed categories by empirical frequency (keeps factors)
generic: replace categories with "Category A/B/..."
custom: use
custom_levels[[colname]]
if provided
- numeric_mode
One of "range","distribution".
range: uniform between min/max (integers stay integer-like)
distribution: sample observed values with replacement
- column_mode
One of "keep","generic","custom".
keep: keep original column names
var1..varP
(mapping inattr(name_map)
)custom: use
custom_names
named vector (old -> new)
- custom_levels
optional named list of allowed levels per column (for
- custom_names
optional named character vector old->new (for
column_mode="custom"
).- seed
Optional RNG seed.
- verbose
Logical; print progress.
- sensitive
Optional character vector of original column names to treat as sensitive.
- sensitive_detect
Logical; auto-detect common sensitive columns by name.
- sensitive_strategy
One of "fake","drop". Only applied if any sensitive columns exist.
- normalize
Logical; lightly normalize inputs (trim, %→numeric, short date-times→POSIXct).