Prepare Input Data: Coerce to data.frame and (optionally) normalize values
Source:R/prepare_input_data.R
prepare_input_data.Rd
Converts common tabular objects to a base data.frame
, and if normalize = TRUE
it applies light, conservative value normalization:
Converts common date/time strings to POSIXct (best-effort across several formats)
Converts percent-like character columns (e.g. "85%") to numeric (85)
Maps a configurable set of "NA-like" strings to
NA
, while keeping common survey responses like "not applicable" or "prefer not to answer" as real levelsNormalizes yes/no character columns to an ordered factor
c("no","yes")
Usage
prepare_input_data(
data,
normalize = TRUE,
na_strings = c("", "NA", "N/A", "na", "No data", "no data"),
keep_as_levels = c("not applicable", "prefer not to answer", "unsure"),
percent_detect_threshold = 0.6,
datetime_formats = c("%m/%d/%Y %H:%M:%S", "%m/%d/%Y %H:%M",
"%Y-%m-%d %H:%M:%S", "%Y-%m-%d %H:%M", "%Y-%m-%dT%H:%M:%S",
"%Y-%m-%dT%H:%M", "%m/%d/%Y", "%Y-%m-%d")
)
Arguments
- data
An object coercible to
data.frame
(data.frame/tibble/data.table/matrix/list, etc.)- normalize
Logical, run value normalization step (default
TRUE
).- na_strings
Character vector that should become
NA
(default:c("", "NA", "N/A", "na", "No data", "no data")
).- keep_as_levels
Character vector that should be kept as values (not
NA
), e.g., survey choices (default:c("not applicable", "prefer not to answer", "unsure")
). Matching is case-insensitive.- percent_detect_threshold
Proportion of non-missing values that must contain
%
before converting a character column to numeric (default0.6
).- datetime_formats
Candidate formats tried (in order) when parsing date-times strings. The best-fitting format (most successful parses) is used. Defaults cover
mm/dd/yyyy HH:MM(:SS)?
, ISO-8601, and date-only.