John Ehrlinger, PhD

Assistant Staff, Lead Data Scientist
Cardiovascular Outcomes, Registries & Research (CORR)
Department of Cardiothoracic Surgery · Heart, Vascular & Thoracic Institute
Cleveland Clinic
Clinical Assistant Professor (Joint Appointment) · Cleveland Clinic Lerner College of Medicine

About

I am an applied statistician and data scientist with expertise in computational statistical machine learning, deep learning, and their application to cardiovascular and clinical outcomes research. My work spans both the theoretical development of machine learning methods and their direct translation into clinical practice through open-source software and reproducible analytical workflows.

I returned to Cleveland Clinic in December 2024 as Assistant Staff, Lead Data Scientist in the Department of Cardiothoracic Surgery, Heart, Vascular & Thoracic Institute, where I lead a team of data engineers and data scientists in the Cardiovascular Outcomes, Registries and Research (CORR) group. My team supports registry-based cardiovascular outcomes research, drives statistical methods research applied to observational clinical data, and establishes best practices in software engineering and reproducible science.

Prior to returning to Cleveland Clinic, I was Senior Data and Applied Scientist at Microsoft Azure (2015–2023), leading ML/AI customer engagements across oil & gas, aerospace, and medical devices, and Senior Data Scientist – Technical Lead at Altamira Technologies (2023–2024) supporting USAF analytics on secure networks. I hold a PhD in Statistics from Case Western Reserve University (2011), where my dissertation established rigorous theoretical properties of ℓ2Boosting, and earlier graduate degrees in Mechanical and Aerospace Engineering.

Research Interests

Random forest and ensemble methods · Survival and time-to-event analysis · Longitudinal and temporal data modeling · Deep learning for clinical outcomes · Parametric hazard modeling · Regularization and gradient boosting theory · Reproducible clinical research workflows · Open-source statistical software

Current focus: open-source implementations of multi-phase hazard analysis methods (the hazard SAS/C package and mixhazard R port), and machine learning for waitlist and post-transplant survival in advanced heart failure.

Open-Source Software

Visualizing random forest models; graphical analysis of survival, regression, and classification forests.
Boosted multivariate trees for longitudinal data.
hazard Maintainer
SAS and C implementation of multi-phase hazard analysis for time-to-event decomposition. (Maintainer)
R port of the C code underlying the Hazard SAS module; pre-release.
HVTI-standard publication graphics for reproducible clinical research figures.
Utility functions supporting reproducible HVTI research workflows.

Contact