Generate fake data with privacy controls — generate_fake_with

Generates a synthetic copy of data, then optionally detects/handles sensitive columns by name. Detection uses the ORIGINAL column names and maps to output via attr(fake, "name_map") if present.

Usage

generate_fake_with_privacy(
  data,
  n = 30,
  level = c("low", "medium", "high"),
  seed = NULL,
  sensitive = NULL,
  sensitive_detect = TRUE,
  sensitive_strategy = c("fake", "drop"),
  normalize = TRUE,
  sensitive_patterns = NULL,
  sensitive_regex = NULL
)

Arguments

data: A data.frame (or coercible) to mirror.
n: Rows to generate (default same as input if NULL).
level: One of "low","medium","high".
seed: Optional RNG seed.
sensitive: Character vector of original column names to treat as sensitive.
sensitive_detect: Logical; auto-detect common sensitive columns by name.
sensitive_strategy: One of "fake" or "drop".
normalize: Logical; lightly normalize inputs.
sensitive_patterns: Optional named list of patterns to treat as sensitive (e.g., list(id = "...", email = "...", phone = "...")). Overrides defaults.
sensitive_regex: Optional fully-combined regex (single string) to detect sensitive columns by name. If supplied, it is used instead of defaults.

Value

data.frame with attributes: sensitive_columns, dropped_columns, name_map

Details

Generate fake data with privacy controls