CS2c: Celebrating Our Technical Expertise

Conference: Women in Statistics and Data Science 2022
10/07/2022: 10:00 AM - 11:30 AM CDT
Concurrent 
Room: Grand Ballroom Salon E 

Chair

Emily Leary, University of Missouri

Presentations

One size fits all: A generalized algorithm to fitting GLMs with censored predictors in R

Ever since the discovery of therapies that target the genetic root cause of Huntington disease, researchers have worked to test if these therapies can slow or halt the disease symptoms. A first step towards achieving this is modeling how symptoms progress to know when the best time is to initiate a therapy. Because symptoms are most detectable before and after a clinical diagnosis, modeling how symptoms progress has been problematic since the time to clinical diagnosis is often censored (i.e., for patients who have not yet been diagnosed). This creates a pressing statistical challenge for modeling how symptoms (the outcome) change before and after time to clinical diagnosis (a censored predictor). Strategies to tackle this challenge include fitting a generalized linear model with a censored covariate using maximum likelihood estimation. Still, implementation of these models can be taxing because each new setting (i.e., different outcome models and distributions for the censored predictor) requires a new algorithm to be derived. To this end, we have created the glmCensRd package, which includes generalized linear model fitting functions for a multitude of outcome and (censored) predictor specifications and various random and non-random censoring types. The glmCensRd package makes fitting generalized linear models in R as accessible with censored predictors as without. We provide multiple intuitive examples and demonstrate its impact in fitting a variety of clinically meaningful models from data that are currently being used to design clinical trials for Huntington disease. 

Presenting Author

Sarah Lotspeich, Wake Forest University

First Author

Sarah Lotspeich, Wake Forest University

CoAuthor(s)

Tanya Garcia, University of North Carolina at Chapel Hill
Peter Guan, University of North Carolina at Chapel Hill

Sequential Pattern Mining of Electronic Health Record for Early Diagnosis of Amyotrophic Lateral Sclerosis

Amyotrophic lateral sclerosis (ALS) is a neurodegenerative disease primarily affecting the upper and lower motor neurons. The average survival time for ALS patients is 19 months from the time of diagnosis and 30 months from symptom onset. The diagnosis of ALS is primarily based on clinical evaluation along with a series of tests to rule out other mimicking diseases. The clinical diagnosis remains challenging with an average diagnostic delay of 11 to 12 months or more after the onset of symptoms. Thus, early diagnosis of ALS is critical to prolong survival and improve quality of life. Our early work indicates that early detection of ALS based on electronic health records (EHR) using sequential pattern mining algorithm is possible to reach sensitivity and specificity accuracy to serve as an assistive diagnosis tool.

In this study, we further develop the early ALS detection algorithm in the suspected ALS patient population requiring diagnosis from a neurologist . Our objective is to improve our algorithm accuracy to >80% sensitivity with at least 90% specificity and to reduce the complexity of the algorithm to make the algorithm explainable. The algorithm is validated on an independent EHR dataset. The design of the prospective clinical validation study of the algorithm is described. 

Presenting Author

Lily Sun

First Author

Lily Sun

CoAuthor(s)

Cindy Liang, Texas Academy of Mathematics and Science
Tianran Song, Rutgers Preparatory

Submodel Approximation under Preconditioning Outcome Approach

Clinical prediction models have been widely acknowledged as an informative tool that provides evidence-based support for clinical decision making. However, such prediction models are often underused in clinical practice due to many reasons including the presence of missing information in a new patient. Motivated by a study to implement a prediction model (STRATIFY) into the clinical work flow of emergency department, we propose a novel submodel estimation approach to address real-time missing information issues. For prediction models such as STRATIFY that were developed using the "preconditioning outcome" approach, the proposed submodel coefficients are shown to be equivalent to the original prediction model coefficients plus a corrected factor corresponding to the orthogonal projection of the missing components in the preconditioning outcome onto the range space of non-missing information. Comprehensive simulations were conducted to assess the performance of the proposed estimation approach and compared with an existing "one-step-sweep" based approach using various performance measurements including C-index, negative and positive predicted value (NPV, PPV), calibration intercept and slope, Brier score and root mean squared predicted error (rMSPE). The proposed approach were applied to electronic health records (EHR) data from the Emergency Department at Vanderbilt University Medical Center to develop submodels for STRATIFY which will subsequently be embedded in the STRATIFY clinical decision support tool for real-time implementation. 

Presenting Author

Tianyi Sun, Vanderbilt University

First Author

Tianyi Sun, Vanderbilt University

CoAuthor

Dandan Liu, Vanderbilt University Medical Center