Skip to contents

Overview

hvtiRutilities provides utility functions for working with clinical research data at the Cleveland Clinic Heart, Vascular and Thoracic Institute (HVTI) Clinical Outcomes Registries and Research (CORR) department. The package simplifies common data preparation tasks when working with SAS datasets in R.

Main Functions

  • r_data_types(): Automatically infer and convert data types in a dataset
    • Converts character columns to factors
    • Detects binary numeric variables (0/1) and converts to logical
    • Converts numeric variables with few unique values to factors
    • Handles various NA representations (“NA”, “na”, etc.)
    • Preserves variable labels from SAS/labelled data
  • label_map(): Extract variable labels from labeled datasets
    • Creates a lookup table mapping variable names to their labels
    • Useful for working with SAS datasets that have variable labels
    • Returns a data frame with key (variable name) and label columns
  • sample_data(): Generate sample datasets for testing
    • Creates datasets with various column types for testing package functions
    • Useful for examples and unit tests

Installation

You can install the development version of hvtiRutilities from GitHub with:

# install.packages("pak")
pak::pak("ehrlinger/hvtiRutilities")

Usage Examples

Automatic Type Conversion

library(hvtiRutilities)

# Create sample data
dta <- sample_data(n = 100)

# Examine original types
str(dta)
# boolean: int (values: 1, 2)
# logical: chr (values: "F", "T")
# char: chr (values: "male", "female")

# Apply automatic type conversion
dta_converted <- r_data_types(dta)

# Examine converted types
str(dta_converted)
# boolean: logi (binary 1/2 → TRUE/FALSE)
# logical: Factor (character → factor)
# char: Factor (character → factor)

Skip Specific Columns

# Skip conversion for specific variables
dta_partial <- r_data_types(dta, skip_vars = c("boolean", "char"))
# boolean and char remain unchanged, others are converted

Control Factor Creation

# Convert only variables with fewer than 5 unique values to factors
dta_strict <- r_data_types(dta, factor_size = 5)

# Keep binary variables as factors instead of logical
dta_factors <- r_data_types(dta, binary_factor = TRUE)

Working with Variable Labels

# Create labeled data (common with SAS imports)
library(labelled)
dta <- data.frame(
  age = c(25, 30, 35),
  sex = c("M", "F", "M"),
  bp = c(120, 130, 125)
)
var_label(dta$age) <- "Patient Age (years)"
var_label(dta$sex) <- "Patient Sex"
var_label(dta$bp) <- "Systolic Blood Pressure (mmHg)"

# Extract labels as a lookup table
labels <- label_map(dta)
print(labels)
#   key                              label
# 1 age        Patient Age (years)
# 2 sex                    Patient Sex
# 3  bp  Systolic Blood Pressure (mmHg)

# Use for matching/joining
summary_table <- data.frame(variable = c("age", "bp"))
summary_table$label <- labels$label[match(summary_table$variable, labels$key)]

Key Features

  • Preserves variable labels: All functions maintain SAS/labelled variable attributes
  • Handles NA variants: Automatically converts “NA”, “na”, “Na”, “nA” strings to actual NA values
  • Type-safe: Returns the same data structure class as input (data.frame, tibble, data.table, etc.)
  • Flexible control: Multiple parameters to customize type conversion behavior

Getting Help

License

GPL (>= 3)