SPEED 7: Biostatistics and Applied Statistics, Part 1

Jiachen Lu Chair
Merck & Co, Inc
 
Wednesday, Aug 6: 8:30 AM - 10:20 AM
4150 
Contributed Speed 
Music City Center 
Room: CC-104A 

Presentations

Applications of Riemann Integration in Biostatistics

Riemann integration is a mathematical approach that offers distinct advantages over single point for analyses making it a preferable endpoint under certain conditions. A prominent application is the estimation of the area under the curve (AUC), utilized in pharmacokinetic and pharmacodynamic analyses. Since these measurements are continuous and collected at discrete timepoints, Riemann integration becomes the most easily applied method for estimating integrals.

As an example, summed pain intensity (SPI) is calculated using the trapezoidal rule version of Riemann integration derived from Numeric Pain Rating Scale (NPRS) measurements. Simulations on this endpoint show reductions in coefficient of variation compared to single point analysis when there is variance between timepoints, and thus as a result increased statistical effect size.

This methodology can be utilized in additional endpoints to enhance endpoint robustness through aggregation of continuous data across multiple time points. 

Keywords

Endpoint

Power

Pharmacodynamics

Pharmacokinetics 

Co-Author(s)

Clay Dehn, Evolution Research Group
William Martin, Lotus Clinical Research
Mark Jaros, Summit Analytical

First Author

Lance Ballester, Lotus Clinical Research

Presenting Author

Lance Ballester, Lotus Clinical Research

Automatic recognition of heart disease based on phonocardiogram

Heart sound recognition is crucial for early cardiovascular disease detection, but auscultation alone often leads to diagnostic challenges, even for experienced clinicians. To address this, we propose a convolutional recurrent neural network (CRNN) model combined with machine learning, utilizing MFCC, SFTF, and Deep Scattering features. Applied to 512 datasets from E-Da Hospital, our CRNNA + LightGBM model achieved 92.2% accuracy (specificity: 96.2%, sensitivity: 88%), outperforming physicians by 9.7% in accuracy and 24% in sensitivity.

Using self-attention mechanisms, we visualized the model's focus areas, which closely matched physicians' auscultation regions, demonstrating its ability to act as a diagnostic proxy. Validation on the 2016 PhysioNet/CinC Challenge database further confirmed the model's robustness, achieving 95% accuracy (specificity: 93%, sensitivity: 98%). 

Keywords

CRNNA


Deep scattering

Heart sound classification

Light GBM

MFCC

PCG 

Co-Author(s)

Ting-Yu Yan, Deaprtment of Applied Mathematics, National Sun Yat-sen University
Yu-Jung Huang, I-Shou University
Ming-chun Yang, Department of Pediatrics, E-Da Hospital, Kaohsiung, Taiwan,
Wei-Chen Lin, Department of Medical Research, E-DA Hospital

First Author

Meihui Guo, National Sun Yat-Sen University

Presenting Author

Meihui Guo, National Sun Yat-Sen University

Dose-Dependent Microglial Depletion with PLX5622 in Mice

Dysregulated neuroinflammation is hypothesized to be a leading contributor to neurodegenerative diseases. Microglia, the immune cells of the brain, are crucial in maintaining tissue homeostasis and driving neuroinflammation. Microglia depend on colony-stimulating factor 1 receptor (CSF1R) signaling to survive. CSF1R inhibitors (e.g. PLX5622) are used to deplete microglia in the brain, providing valuable tools for studying microglial dynamics. In this in vivo pharmacology study, we investigate dose-dependent microglial depletion by PLX5622 in wildtype mice (n=48). Six groups of mice (8 mice each: 4 males and 4 females) were treated for 4 weeks with various drug doses (0, 100, 300, 600, 900, or 1200 mg/kg). Whole-brain sections immunostained with Iba1 will be used to quantify microglial depletion and analyzed via one-way ANOVA, with Tukey's post-hoc test to assess dose differences. In surviving microglia, morphological phenotypes, branch length, and soma size will be analyzed using multivariate analysis (MANOVA) and clustering techniques to identify dose-dependent differences. These findings will contribute to understanding microglial dynamics in response to CSF1R inhibition. 

Keywords

Microglial

CSF1R inhibition

Dose-dependent depletion

One-way ANOVA

Tukey’s post-hoc test 

Co-Author(s)

Yumary Rubio, Institute for Neurodegenerative Diseases, Weill Institute for Neurosciences, UCSF
Stephanie Huard, Institute for Neurodegenerative Diseases, Weill Institute for Neurosciences, UCSF
Suzanne Dufault, Department of Epidemiology and Biostatistics, UCSF
Carlo Condello, Institute for Neurodegenerative Diseases, Weill Institute for Neurosciences, UCSF

First Author

Nya Campbell, Department of Epidemiology and Biostatistics

Presenting Author

Nya Campbell, Department of Epidemiology and Biostatistics

Equivalence of Generalized Bivariate Bernoulli Dependency Test & Re-parameterize Logistic Regression

Background: Binary endpoints at two timepoints (e.g., pre- vs. post-treatment) are common in healthcare research. The Generalized Bivariate Bernoulli Model (GBBM) is a specialized GLM for bivariate binary data but lacks software for direct analysis. Additionally, the original comparison of the GBBM dependency test to regressive logistic regression is flawed.
Methods: We propose a re-parameterized logistic regression model, proving its equivalence to the GBBM dependency test theoretically and empirically. Simulations compare the power of the GBBM test with a) the regressive logistic model, b) our re-parameterized logistic model, and c) the Pearson Chi-square test. We also analyze infant mortality data from BDHS.
Results: The GBBM test's power differs from the regressive logistic model but matches our re-parameterized logistic model across effect and sample sizes.
Conclusion: This study refines dependency analysis in bivariate binary data, enhancing accessibility for researchers. 

Keywords

Longitudinal binary endpoints

generalized linear models

repeated measures 

Co-Author(s)

Devin Koestler, University of Kansas Medical Center
Yanming Li

First Author

Kazi Md Farhad Mahmud

Presenting Author

Kazi Md Farhad Mahmud

Evaluation of Novel and Traditional Anthropometric Indices for Predicting Metabolic Syndrome

Various anthropometric indices have been proposed to assess central obesity and predict metabolic syndrome (MetS). This presentation aimed to compare the predictive potential of anthropometric indices for MetS and its components. Among Nepalese adults, the Visceral Adiposity Index (VAI) and Lipid Accumulation Product (LAP) outperformed traditional measures such as Body mass index (BMI), waist-to-hip ratio (WHR), and waist-to-height ratio (WHtR) in predicting MetS and its components. Optimal cutoffs were as follows: VAI > 1.97 (females), > 2.16 (males); LAP > 53.4 (both sexes); WHR > 0.98 (both sexes); WHtR > 0.638 (females), > 0.56 (males); BRI > 5.76 (females), > 4.75 (males). ABSI and BAI exhibited the poorest diagnostic performance for MetS prediction in both sexes. 

Keywords

Anthropometric indices

Metabolic Syndrome

ROC curve

Sensitivity

Specificity 

Co-Author(s)

Binod Manandhar, Clark Atlanta University
Krishna Das Manandhar, Central Dept of Biotechnology,Tribhuvan University

First Author

Daya Ram Pokharel, Manipal College of Medical Sciences

Presenting Author

Binod Manandhar, Clark Atlanta University

Exploring Ancient Vessel Morphology using Model Based Clustering

How often do statisticians get to work on ancient pottery data from a 14th century archeological site in Greece?
I had the opportunity to collaborate with a group of archeologists to mine data on ancient ceramic vessels, which were retrieved from a sealed well deposit found within the archeological site. A model-based cluster analysis method, Gaussian Mixture Models Clustering, was applied to vessel dimensions to identify clusters, and tested stability of clusters using a series of non-parametric testes. The clusters were used to verify the morphology of the ceramic vessels conforming to the standard archeological vessel shapes identified by archeologists. This presentation will discuss the statistical modeling and the results, in application to uncovering clusters in the ancient ceramic vessel data. 

Keywords

Archeology

Gaussian Mixture Model

Model Based Clustering

Vessel Morphology 

Co-Author(s)

Mark Kimpel, Indiana University School of Medicine(retired)
Kim Shelton, University of California Berkeley
Rasitha Jayesekere, Butler University

First Author

Lynne Kvapil, Butler University

Presenting Author

Rasitha Jayesekere, Butler University

Extending Synthesis Analysis to Survival Outcome

Meta-analysis is a statistical technique to combine and summarize prior quantitative studies to assess the impact of a specific subject or intervention. Synthesizing these meta-analyses can determine the consistency and robustness of findings across different populations and settings. Synthesis analysis is one such application, which is a multivariable meta-analysis that estimates the relationship between multiple predictors and an outcome variable. However, this method has only been applied to linear and logistic models. Survival analysis, which focuses on time-to-event data, offers critical insights into the timing of events such as disease progression or treatment efficacy. Extending synthesis analysis to survival data is a novel meta-analytic approach that allows for a more comprehensive synthesis of public health studies. The extension aims to improve risk estimation, statistical power and reduce biases while optimizing temporal, labor, and financial efficiencies, focusing on non-communicable diseases like cardiovascular disease, diabetes, and cancer. This paper provides a comprehensive review of existing synthesis analyses, guiding their application to survival outcomes. 

Keywords

Meta-analysis

Synthesis analysis

Prediction model

Multivariable analysis

Survival outcome

Non-communicable disease 

Co-Author(s)

Nan Hu, Florida International University
Michelle Hospital, Florida International University

First Author

Rabeya Illyas Noon, Florida International University

Presenting Author

Rabeya Illyas Noon, Florida International University

Flexible Item Response Theory Models for Educational and Healthcare Data

Item Response Theory (IRT) has long been a cornerstone of educational testing, enabling accurate measurement of student ability across diverse types of assessments. Recently, these models have also shown promise in healthcare, capturing latent traits like quality of life, patient satisfaction, and symptom severity. In this work, we present a flexible approach to IRT accommodating multiple item types (e.g., dichotomous, polytomous) and leveraging modern computational methods for parameter estimation. We introduce our open-source Python package IRTorch, which streamlines model building and parameter estimation while offering robust tools for handling large-scale datasets. We demonstrate how these models handle complex response structures in Swedish SAT data and patient-reported outcomes on stroke recovery from the Swedish Stroke Register. We also highlight key insights for practitioners, including guidelines for model selection, diagnostics, and handling missing or noisy data. These findings underscore the broad applicability of modern IRT methods for quantitative research across domains, leading to more nuanced and actionable insights in both education and healthcare. 

Keywords

Item Response Theory

Psychometrics

Healthcare

Statistical software

PyTorch 

Co-Author(s)

Marie Eriksson, Umeå University
Marie Wiberg, Umeå University

First Author

Joakim Wallmark

Presenting Author

Joakim Wallmark

Generalized Functional Linear Models for Right-Censored Time-to-Event Data

Interpreting real-time data from wearable devices, such as continuous glucose monitors (CGM), to inform long-term adverse event risk is a central objective of digital health and precision medicine. We address a gap in existing regression-based methods for modeling scalar responses with functional predictors by developing a generalized functional linear model for a right-censored scalar response that incorporates both functional and scalar covariates. We consider a direct binomial model in which a binary outcome indicates the survival of a subject past a fixed time horizon. We approximate the random functional predictors using a truncated Karhunen-Loève expansion, with the truncation parameter permitted to increase with sample size. Inverse probability of censoring weights are used to obtain unbiased effect size estimates in the presence of censoring. By establishing asymptotic normality, we construct confidence intervals for both the scalar coefficients and the parameter function. We illustrate our method by modeling the survival probability of over 2,000 veterans with type 2 diabetes using CGM data and their baseline scalar characteristics. 

Keywords

functional regression

right censoring

generalized linear model

digital health

wearable devices 

Co-Author(s)

Sijie Zheng, UCLA
Tomoki Okuno, UCLA
Jin Zhou, UCLA
Hua Zhou, UCLA
Gang Li, University of California-Los Angeles

First Author

Jonathan Hori

Presenting Author

Jonathan Hori

Life and Functional Time Prediction Using Machine Learning in Palliative Care

To predict exact life expectancy is needed to plan patient's future in palliative care. The aim of this study is to apply multiple machine learning models to achieve highly accurate prediction and to consider factors that influence functional and life prognosis. Three types of functional time predictions for walking, eating, and communicating, and life time prediction was analyzed. Functional and life time prediction were analyzed using four models: decision tree, LASSO regression, random forest, and XGBoost. None of the models achieved high accuracy in each prediction. The feature importance of each model showed different characteristics when comparing each prediction and model. RMSE of LASSO regression, random forest, and XGBoost were about 7 days for each functional time prediction and about 6 days for life time prediction. In this study, the survival period was limited to 30 days or less, so this error is considered to be very large for patients. The feature importance showed that laboratory data was important for each prediction. In the prediction using machine learning, not all models achieved high accuracy. However, very useful results were obtained from feature importance. 

Keywords

palliative care

machine learning

decision tree

LASSO

random forest

XGBoost 

Co-Author

Ayano Takeuchi, Keio University

First Author

Katsuei Takahashi

Presenting Author

Katsuei Takahashi

Linking Potentially Misclassified Healthy Food Access to Diabetes Prevalence

Without access to healthy food, preventing illnesses like diabetes is difficult. This access can be quantified for an area by measuring its distance to the nearest grocery store, but there is a trade off. We can either measure a more accurate but expensive distance only using passable roads or an error-prone but easy-to-obtain straight-line metric ignoring infrastructure and natural barriers. Fitting a standard regression model to the relationship between disease prevalence and error-prone food access would introduce bias, but fully observing the more accurate measure is often impossible, creating a missing data problem. We address these challenges by deriving a new maximum likelihood estimator for Poisson regression with a binary, error-prone exposure where the errors may depend on additional error-free covariates. Via simulation, we show the consequences of ignoring the error and how the proposed estimator corrects for that bias while preserving more statistical efficiency than the complete case analysis. Finally, we apply our estimator to data from the Piedmont Triad in North Carolina, where we model the relationship between diabetes prevalence and access to healthy food. 

Keywords

Grocery Stores

Maximum Likelihood Estimation

Measurement Error

Missing Data

One-Sided Misclassification

Poisson Regression 

Co-Author(s)

Sarah Lotspeich, Wake Forest University
Anh Nguyen, Wake Forest University

First Author

Ashley Mullan, Vanderbilt University

Presenting Author

Ashley Mullan, Vanderbilt University

Machine Learning Approaches to Identify Neonates at Risk for Post-Discharge Mortality in Dar es Sala

Machine learning (ML) can increase discriminatory value in risk assessment tools compared to traditional regression. We explored the performance of ML models, compared to a previously derived logistic regression model (area under the curve [AUC]=0.77, 10 variables), for predicting all-cause mortality within 60 days post-discharge among neonates from two national referral hospitals in sub-Saharan Africa.
In a prospective cohort of 2,294 neonates (3% mortality rate), data were randomly split (80% training, 20% testing). We addressed class imbalance with Synthetic Minority Oversampling and selected variables via minimum-Redundancy maximum-Relevance. We trained random forest, XGBoost, hist gradient boosting, support vector machine (SVM), and neural network models, optimizing hyperparameters via 5-fold cross-validation.
Hist gradient, random forest, and XGBoost achieved AUCs of 0.99 with six variables. Neural network (AUC=0.97) required eight, and SVM (AUC=0.89) required 17 but was computationally heavy. ML models outperformed logistic regression (p<0.001). Selecting parsimonious, high-accuracy, low-cost models are key for feasible clinical implementation. 

Keywords

Machine learning

Prediction modeling

Logistic regression

Model performance

Risk prediction 

Co-Author(s)

Chris Rees, Emory University School of Medicine; Children’s Healthcare of Atlanta
Rodrick Kisenge, Muhimbili University of Health and Allied Sciences
Evance Godfrey, Muhimbili University of Health and Allied Sciences
Readon Ideh, John F. Kennedy Medical Center
Julia Kamara, John F. Kennedy Medical Center
Ye-Jeung Coleman-Nekar, John F. Kennedy Medical Center
Abraham Samma, Muhimbili University of Health and Allied Sciences
Hussein Manji, Muhimbili University of Health and Allied Sciences; The Aga Khan Health Services
Christopher Sudfeld, Harvard T.H. Chan School of Public Health
Michelle Niescierenko, Boston Children’s Hospital; Harvard Medical School
Claudia Morris, Emory University School of Medicine; Children’s Healthcare of Atlanta
Todd Florin, Ann & Robert H. Lurie Children's Hospital of Chicago
Christopher Duggan, Harvard T.H. Chan School of Public Health; Boston Children’s Hospital
Karim Manji, Muhimbili University of Health and Allied Sciences
Rishikesan Kamaleswaran, Department of Biostatistics and Bioinformatics, Duke University

First Author

Adrianna Westbrook, Emory University

Presenting Author

Adrianna Westbrook, Emory University

MetaScope: An R Package for Accurate Metagenomic Taxonomic Profiling

MetaScope is a novel R package designed for the rapid, accurate taxonomic profiling of metagenomic and 16S sequencing reads. MetaScope addresses a critical need for efficient and precise microbial composition analysis. Its core modules are MetaRef, which builds reference genome sequence libraries, MetaAlign, which aligns reads to the target library using Bowtie 2 or Subread aligners, MetaFilter, which filters reads that align to the host library, and MetaID, which reassigns ambiguously mapped reads to their likely genome of origin using a Bayesian model. MetaScope also offers demultiplexing and output aggregation modules to enhance functionality and integrates with the animalcules R package for downstream microbiome analysis. A novel feature is the complementary coverage plots in the MetaID module, enabling additional quality checking and improved post-processing. We evaluated MetaScope's performance with benchmarking against mock microbial communities using 16S datasets. These results demonstrate that MetaScope achieves strain-level differentiation capabilities and demonstrates high sensitivity compared to other 16S profilers. 

Keywords

Bayesian

Metagenomics

Microbiome

Microbial Profiling

Genomics 

Co-Author(s)

W Evan Johnson, Rutgers University
Sean Lu, Rutgers University

First Author

Aubrey Odom, Boston University

Presenting Author

Aubrey Odom, Boston University

Mixed effects modeling to improve inference in dose response studies with plate variability

Microtiter plate formats are a standard tool in laboratory experiments, allowing scientists to investigate physical, chemical, and biological reactions of test articles in various assays. We investigated data from a 384-well in-vitro study involving 18 test articles , which included 13 mixtures and an active product constituent, along with positive, negative, controls (e.g., vehicle controls). The experiment was conducted using two cell types, and two assays, with multiple replicates. Test articles were dosed in 10 concentrations in duplicate, spaced at equal log intervals. Despite normalization to vehicle controls, marked plate-to-plate variability was observed. Dose response curves were fitted for each replicate using the tcplfit2 library in R, selecting the best fitted model based on the lowest AIC. We focused on benchmark dose concentration as a key endpoint of the fitted curve. We applied a mixed-effects model with plate as a random effect to account for the observed plate-specific variability. This modeling approach provides a framework for addressing plate variability in dose response studies, enhancing reproducibility and accuracy. 

Keywords

Mixed effect model

in-vitro experiment

dose response modeling

Toxicology

Cell-based assays 

Co-Author(s)

Shawn Harris, Social & Scientific Systems
Guanhua Xie, DLH Corp
Stephanie Smith-Roe, National Institute of Environmental Health Sciences
Keith Shockley, National Institute of Health
Stephen Ferguson, National Institute of Environmental Health Sciences

First Author

Caroll Co, DLH

Presenting Author

Caroll Co, DLH

Prioritized and data-robust estimation strategies for tumor growth studies

Longitudinal tumor growth studies serve a foundational role in preclinical therapeutic evaluation, acting as precursors to human clinical trials. Despite the prevalence of these experiments, there is little consensus on how best to analyze the resulting data, largely due to underemphasized data challenges such as non-linearity, censoring and correlated errors. We capitalize on common design characteristics to develop a composite, prioritized estimator that is interpretable as well as robust to several of these data challenges. To provide a platform for identifying treatment synergy or dose toxicity, the semi-parametric proportional odds model is proposed to extend our estimator to the regression setting. We develop an algorithm to maximize a quasi-conditional likelihood, allowing us to avoid the estimation of N-1 nuisance parameters. Finally, we show how a time-dependent win ratio can be used to extend our method to the case of clustered data, where one animal may have several tumors under study. Closed form cluster-correct variance calculations are provided. The implementation of the methods are demonstrated on several HPV+ head and neck squamous cell carcinoma xenograft models. 

Keywords

Win ratio

Composite

Semi-parametric

Preclinical

Proportional odds

Growth models 

Co-Author(s)

Randall Kimple, UW-Madison
Gopal Iyer, UW-Madison
Richard Chappell, UW-Madison
Menggang Yu, University of Michigan

First Author

Colin Longhurst, University of Wisconsin-Madison

Presenting Author

Colin Longhurst, University of Wisconsin-Madison

WITHDRAWN Signature for response to PD-L1 inhibitor in metastatic Urothelial Cancer

About 90% of human cancer deaths are due to metastasis. To date, immune checkpoint inhibitors (ICIs) are one of the frontier treatments that have improved the survival of metastatic cancer patients with few side effects. However, the objective response rate for ICIs is low, only ~30% in urothelial carcinoma (UC), highlighting the need to identify signatures for response prediction. Several state-of-the-art signatures have been revealed in first-tier journals, demonstrating the area's importance. As the number of genes (features; ~20,000) greatly exceeds the sample sizes of training sets (≤300), we first developed feature selection procedures to reduce features to a few hundred. Next, we trained several classifiers using Imvigor210 and the selected genes, comprising RNA-seq and clinical data of ~298 patients with mUC, via 5-fold cross-validation. In particular, our predictor based on logit regression (LogitDA) with the revealed signature achieved a prediction AUC of 0.75; our signature outperformed the known signatures (e.g., PD-L1, PD-1, the IFNG, tGE8, T exhaust, and T inflamed). Overall, our findings show that LogitDA and our signature predict immunotherapy response well in mUC. 

Keywords

biomarker

cancer

machine learning

regression

prediction 

Co-Author

Peter Langfelder, UCLA

First Author

Grace Shieh, Institute of Statistical Science

Statistical Exploration of Calibration Curve Modeling with STEM Students

Linear regression is a topic that typically is part of a statistical education. One application of linear regression in science and engineering is through calibration curve modeling, for example, in chemistry. When creating a calibration curve, the technician creates multiple replicates of the response at fixed values of the predictor. Then, a technique such as least squares is utilized to estimate the calibration curve. This curve is estimated with error, where the error is utilized for other parts of the calibration analysis. Although not a recommended practice, sometimes the calibration curve is then fit utilizing averages of the response, instead of the original observations of the response. We discuss how to explore the differences in these approaches visually, through simulation, and theoretically with STEM students. 

First Author

Megan Heyman, Rose-Hulman Institute of Technology

Presenting Author

Megan Heyman, Rose-Hulman Institute of Technology

Want Some Pie? The Impact of Graphical Literacy on Data Visualization Choices in Public Health

Health communication through data visualization is one of the most important skills public health professionals should have to help their communities effectively. However, rigorous training on data visualization and statistics is rare for public health degree programs across the country. There is a critical need to account for graphic literacy levels in the general population for effective communication of complex health issues. We performed a cross-sectional exploratory study to assess the graphic literacy of a nationally representative sample (N=524) in the United States and preference for data visualization types applied to COVID-19 variant proportion data. Results showed that graphic literacy levels, as measured by the Short Graphic Literacy Scale, were lower than previously measured. Those with higher graph literacy were more likely to select bar and pie charts, while those who were not were more likely to select other chart types. These findings significantly contribute to the development of educational strategies for effective health communication for public health students, enabling them to combat misinformation and reduce health disparities among disadvantaged populations. 

Keywords

Statistical Communication

Data Visualization

Health Communication

Public Health

Education in the Health Sciences 

Co-Author(s)

Sarah Maynard, UNLV
Pamela Paula Pioquinto, UNLV

First Author

Miguel Fudolig, University of Nevada-Las Vegas

Presenting Author

Miguel Fudolig, University of Nevada-Las Vegas

Identifying Incident Proliferative Diabetic Retinopathy Using EHR Data: A Comparison of Methods

Proliferative Diabetic Retinopathy (PDR), the advanced stage of diabetic retinopathy (DR), causes abnormal retinal vessel growth and vision loss. Accurately identifying incident PDR in electronic health records is important for disease monitoring and evaluating interventions. This study evaluates classification methods for identifying incident PDR cases, using the UCSF De-identified Clinical Data Warehouse. Patients aged ≥ 18 with at least one DR diagnosis by an eye provider and available de-identified clinical notes were included. 321 patients were randomly selected for chart review by an ophthalmologist (gold standard), confirming 158 PDR cases. Six methods were evaluated: first ICD9/10 code with no lookback period, first ICD9/10 with a one-year lookback period in any department, first ICD9/10 with a one-year lookback period in ophthalmology, rule-based NLP on clinical notes, best-performing ICD9/10 method with NLP, and a generative AI model. Each method will be compared against the gold standard using sensitivity, specificity, PPV, NPV, and F1 score. The proposed methodologies will provide insights into the use of structured and unstructured data for identifying incident PDR. 

Keywords

Electronic Health Records (EHR)

Ophthalmology

Incident Disease 

Co-Author(s)

Sean Yonamine, UCSF
Cathy Sun, UCSF

First Author

Ritika Batte

Presenting Author

Ritika Batte