Skip to contents

Generates a synthetic copy of data, then optionally detects/handles sensitive columns by name. Detection uses the ORIGINAL column names and maps to output via attr(fake, "name_map") if present.

Usage

generate_fake_with_privacy(
  data,
  n = 30,
  level = c("low", "medium", "high"),
  seed = NULL,
  sensitive = NULL,
  sensitive_detect = TRUE,
  sensitive_strategy = c("fake", "drop"),
  normalize = TRUE,
  sensitive_patterns = NULL,
  sensitive_regex = NULL
)

Arguments

data

A data.frame (or coercible) to mirror.

n

Rows to generate (default same as input if NULL).

level

One of "low","medium","high".

seed

Optional RNG seed.

sensitive

Character vector of original column names to treat as sensitive.

sensitive_detect

Logical; auto-detect common sensitive columns by name.

sensitive_strategy

One of "fake" or "drop".

normalize

Logical; lightly normalize inputs.

sensitive_patterns

Optional named list of patterns to treat as sensitive (e.g., list(id = "...", email = "...", phone = "...")). Overrides defaults.

sensitive_regex

Optional fully-combined regex (single string) to detect sensitive columns by name. If supplied, it is used instead of defaults.

Value

data.frame with attributes: sensitive_columns, dropped_columns, name_map

Details

Generate fake data with privacy controls