John Ehrlinger, PhD

About

I am an applied statistician and data scientist with expertise in computational statistical machine learning, deep learning, and their application to cardiovascular and clinical outcomes research. My work spans both the theoretical development of machine learning methods and their direct translation into clinical practice through open-source software and reproducible analytical workflows.

I returned to Cleveland Clinic in December 2024 as Assistant Staff, Lead Data Scientist in the Department of Cardiothoracic Surgery, Heart, Vascular & Thoracic Institute, where I lead a team of data engineers and data scientists in the Cardiovascular Outcomes, Registries and Research (CORR) group. My team supports registry-based cardiovascular outcomes research, drives statistical methods research applied to observational clinical data, and establishes best practices in software engineering and reproducible science.

Prior to returning to Cleveland Clinic, I was Senior Data and Applied Scientist at Microsoft Azure (2015–2023), leading ML/AI customer engagements across oil & gas, aerospace, and medical devices, and Senior Data Scientist – Technical Lead at Altamira Technologies (2023–2024) supporting USAF analytics on secure networks. I hold a PhD in Statistics from Case Western Reserve University (2011), where my dissertation established rigorous theoretical properties of ℓ₂Boosting, and earlier graduate degrees in Mechanical and Aerospace Engineering.

Research Interests

Random forest and ensemble methods · Survival and time-to-event analysis · Longitudinal and temporal data modeling · Deep learning for clinical outcomes · Parametric hazard modeling · Regularization and gradient boosting theory · Reproducible clinical research workflows · Open-source statistical software

Current focus: open-source implementations of multi-phase hazard analysis methods (the hazard SAS/C package and its TemporalHazard R port), and machine learning for waitlist and post-transplant survival in advanced heart failure.

Open-Source Software

ggRandomForests

Visualizing random forest models; graphical analysis of survival, regression, and classification forests.

hvtiBoostmtree

Boosted multivariate trees for longitudinal data; an extended fork of boostmtree.

hazard Maintainer

SAS and C implementation of multi-phase hazard analysis for time-to-event decomposition. (Maintainer)

TemporalHazard

R port of the C code underlying the Hazard SAS module; pre-release.

hvtiPlotR

HVTI-standard publication graphics for reproducible clinical research figures.

hvtiRtables WIP

Manuscript-compliant Word tables from gtsummary objects, following HVTI CORR table construction standards.

hvtiRutilities WIP

Utility functions supporting reproducible HVTI research workflows; in active development.

HVTI Graphics Recipes Book

Catalog of publication figures for clinical outcomes research — Kaplan-Meier, propensity balance, CONSORT, random-forest visualizations — each paired with reproducible code. Quarto book, CC BY 4.0.

Talks

Care and Feeding of Your Biostats Team: Scaling Best Practices in a Large Hybrid SAS/R Team
R/Medicine 2026 · ▶ Watch the recording · LinkedIn post · Materials

How CORR turns workflows that once lived in one person's head into shared, reliable practice: Quarto, renv, the internal hvti* package stack, and automated checks on 25 years of naming conventions, so biostatisticians can focus on the science.

Contact

✉ john.ehrlinger@gmail.com ✉ ehrlinj@ccf.org 🔗 github.com/ehrlinger 💼 linkedin.com/in/ehrlinger 🧬 0000-0002-5340-5154