Create a fake-data bundle for LLM workflows — llm

Generates fake data, writes files (CSV/RDS/Parquet), writes a scrubbed JSON schema, and optionally writes a README prompt and a single ZIP file containing everything.

Usage

llm_bundle(
  data,
  n = 30,
  level = c("medium", "low", "high"),
  formats = c("csv", "rds"),
  path = tempdir(),
  filename = "fake_bundle",
  seed = NULL,
  write_prompt = TRUE,
  zip = FALSE,
  prompt_filename = "README_FOR_LLM.txt",
  zip_filename = NULL,
  sensitive = NULL,
  sensitive_detect = TRUE,
  sensitive_strategy = c("fake", "drop"),
  normalize = FALSE
)

Arguments

data: A data.frame (or coercible) to mirror.
n: Number of rows in the fake dataset (default 30).
level: Privacy level: "low", "medium", or "high". Controls stricter defaults.
formats: Which data files to write: any of "csv","rds","parquet".
path: Folder to write outputs. Default: tempdir().
filename: Base file name (no extension). Example: "demo_bundle". This becomes files like "demo_bundle.csv", "demo_bundle.rds", etc.
seed: Optional RNG seed for reproducibility.
write_prompt: Write a README_FOR_LLM.txt next to the data? Default TRUE.
zip: Create a single zip archive containing data + schema + README? Default FALSE.
prompt_filename: Name for the README file. Default "README_FOR_LLM.txt".
zip_filename: Optional custom name for the ZIP file (no path). If NULL (default), it is derived as paste0(filename, ".zip"), e.g. "demo_bundle.zip".
sensitive: Character vector of column names to treat as sensitive (optional).
sensitive_detect: Logical, auto-detect common sensitive columns (id/email/phone). Default TRUE.
sensitive_strategy: "fake" (replace with realistic fakes) or "drop". Default "fake".
normalize: Logical; if TRUE, attempt light auto-normalization before faking.

Value

List with paths: $data_paths (named), $schema_path, $readme_path (optional), $zip_path (optional), and $fake (data.frame).

Details

Tips Avoid using angle brackets in examples; prefer plain tokens like NAME or FILE_NAME. If you truly want bracket glyphs, use Unicode ⟨name⟩ ⟩name⟨.