Generates fake data, writes files (CSV/RDS/Parquet), writes a scrubbed JSON schema, and optionally writes a README prompt and a single ZIP file containing everything.
Usage
llm_bundle(
data,
n = 30,
level = c("medium", "low", "high"),
formats = c("csv", "rds"),
path = tempdir(),
filename = "fake_bundle",
seed = NULL,
write_prompt = TRUE,
zip = FALSE,
prompt_filename = "README_FOR_LLM.txt",
zip_filename = NULL,
sensitive = NULL,
sensitive_detect = TRUE,
sensitive_strategy = c("fake", "drop"),
normalize = FALSE
)
Arguments
- data
A data.frame (or coercible) to mirror.
- n
Number of rows in the fake dataset (default 30).
- level
Privacy level: "low", "medium", or "high". Controls stricter defaults.
- formats
Which data files to write: any of "csv","rds","parquet".
- path
Folder to write outputs. Default:
tempdir()
.- filename
Base file name (no extension). Example: "demo_bundle". This becomes files like "demo_bundle.csv", "demo_bundle.rds", etc.
- seed
Optional RNG seed for reproducibility.
- write_prompt
Write a README_FOR_LLM.txt next to the data? Default TRUE.
- zip
Create a single zip archive containing data + schema + README? Default FALSE.
- prompt_filename
Name for the README file. Default "README_FOR_LLM.txt".
- zip_filename
Optional custom name for the ZIP file (no path). If
NULL
(default), it is derived aspaste0(filename, ".zip")
, e.g."demo_bundle.zip"
.- sensitive
Character vector of column names to treat as sensitive (optional).
- sensitive_detect
Logical, auto-detect common sensitive columns (id/email/phone). Default TRUE.
- sensitive_strategy
"fake" (replace with realistic fakes) or "drop". Default "fake".
- normalize
Logical; if TRUE, attempt light auto-normalization before faking.