Reads a manifest.yaml produced by update_manifest and,
for every entry, confirms that (a) the file exists, (b) its SHA-256
checksum matches the recorded value, and (c) its row count matches.
Supported formats for automatic row-count verification: CSV
(.csv), SAS (.sas7bdat), and Excel (.xlsx,
.xls). For other file types the row-count check is skipped and
only the SHA-256 is verified.
Call this function at the top of every analysis script or Quarto document to ensure data integrity before any results are generated.
Arguments
- manifest_path
Character. Path to the manifest YAML file. Defaults to
"manifest.yaml"in the current working directory.- data_dir
Character. Directory in which to look for the dataset files. When
NULL(default) the directory containingmanifest_pathis used.- stop_on_error
Logical. If
TRUE(default) the function callsstop()on the first failed check, preventing the analysis from proceeding. Set toFALSEto collect all errors and report them together as a warning.
Examples
if (FALSE) { # \dontrun{
# --- Typical usage: top of every analysis script or .qmd -----------
hvtiRutilities::verify_manifest(here::here("manifest.yaml"))
# cohort_20240115.csv — SHA-256 match (n = 831)
# labs_20240115.sas7bdat — SHA-256 match (n = 1204)
# adjudication_20240115.xlsx — SHA-256 match (n = 47)
# --- Collect all failures instead of stopping on the first ---------
report <- verify_manifest(
here::here("manifest.yaml"),
stop_on_error = FALSE
)
report[report$status == "FAIL", ]
} # }