Wednesday, Aug 6: 8:30 AM - 10:20 AM
4150
Contributed Speed
Music City Center
Room: CC-104A
Presentations
Riemann integration is a mathematical approach that offers distinct advantages over single point for analyses making it a preferable endpoint under certain conditions. A prominent application is the estimation of the area under the curve (AUC), utilized in pharmacokinetic and pharmacodynamic analyses. Since these measurements are continuous and collected at discrete timepoints, Riemann integration becomes the most easily applied method for estimating integrals.
As an example, summed pain intensity (SPI) is calculated using the trapezoidal rule version of Riemann integration derived from Numeric Pain Rating Scale (NPRS) measurements. Simulations on this endpoint show reductions in coefficient of variation compared to single point analysis when there is variance between timepoints, and thus as a result increased statistical effect size.
This methodology can be utilized in additional endpoints to enhance endpoint robustness through aggregation of continuous data across multiple time points.
Keywords
Endpoint
Power
Pharmacodynamics
Pharmacokinetics
Heart sound recognition is crucial for early cardiovascular disease detection, but auscultation alone often leads to diagnostic challenges, even for experienced clinicians. To address this, we propose a convolutional recurrent neural network (CRNN) model combined with machine learning, utilizing MFCC, SFTF, and Deep Scattering features. Applied to 512 datasets from E-Da Hospital, our CRNNA + LightGBM model achieved 92.2% accuracy (specificity: 96.2%, sensitivity: 88%), outperforming physicians by 9.7% in accuracy and 24% in sensitivity.
Using self-attention mechanisms, we visualized the model's focus areas, which closely matched physicians' auscultation regions, demonstrating its ability to act as a diagnostic proxy. Validation on the 2016 PhysioNet/CinC Challenge database further confirmed the model's robustness, achieving 95% accuracy (specificity: 93%, sensitivity: 98%).
Keywords
CRNNA
Deep scattering
Heart sound classification
Light GBM
MFCC
PCG
Co-Author(s)
Ting-Yu Yan, Deaprtment of Applied Mathematics, National Sun Yat-sen University
Yu-Jung Huang, I-Shou University
Ming-chun Yang, Department of Pediatrics, E-Da Hospital, Kaohsiung, Taiwan,
Wei-Chen Lin, Department of Medical Research, E-DA Hospital
First Author
Meihui Guo, National Sun Yat-Sen University
Presenting Author
Meihui Guo, National Sun Yat-Sen University
Dysregulated neuroinflammation is hypothesized to be a leading contributor to neurodegenerative diseases. Microglia, the immune cells of the brain, are crucial in maintaining tissue homeostasis and driving neuroinflammation. Microglia depend on colony-stimulating factor 1 receptor (CSF1R) signaling to survive. CSF1R inhibitors (e.g. PLX5622) are used to deplete microglia in the brain, providing valuable tools for studying microglial dynamics. In this in vivo pharmacology study, we investigate dose-dependent microglial depletion by PLX5622 in wildtype mice (n=48). Six groups of mice (8 mice each: 4 males and 4 females) were treated for 4 weeks with various drug doses (0, 100, 300, 600, 900, or 1200 mg/kg). Whole-brain sections immunostained with Iba1 will be used to quantify microglial depletion and analyzed via one-way ANOVA, with Tukey's post-hoc test to assess dose differences. In surviving microglia, morphological phenotypes, branch length, and soma size will be analyzed using multivariate analysis (MANOVA) and clustering techniques to identify dose-dependent differences. These findings will contribute to understanding microglial dynamics in response to CSF1R inhibition.
Keywords
Microglial
CSF1R inhibition
Dose-dependent depletion
One-way ANOVA
Tukey’s post-hoc test
Co-Author(s)
Yumary Rubio, Institute for Neurodegenerative Diseases, Weill Institute for Neurosciences, UCSF
Stephanie Huard, Institute for Neurodegenerative Diseases, Weill Institute for Neurosciences, UCSF
Suzanne Dufault, Department of Epidemiology and Biostatistics, UCSF
Carlo Condello, Institute for Neurodegenerative Diseases, Weill Institute for Neurosciences, UCSF
First Author
Nya Campbell, Department of Epidemiology and Biostatistics
Presenting Author
Nya Campbell, Department of Epidemiology and Biostatistics
Background: Binary endpoints at two timepoints (e.g., pre- vs. post-treatment) are common in healthcare research. The Generalized Bivariate Bernoulli Model (GBBM) is a specialized GLM for bivariate binary data but lacks software for direct analysis. Additionally, the original comparison of the GBBM dependency test to regressive logistic regression is flawed.
Methods: We propose a re-parameterized logistic regression model, proving its equivalence to the GBBM dependency test theoretically and empirically. Simulations compare the power of the GBBM test with a) the regressive logistic model, b) our re-parameterized logistic model, and c) the Pearson Chi-square test. We also analyze infant mortality data from BDHS.
Results: The GBBM test's power differs from the regressive logistic model but matches our re-parameterized logistic model across effect and sample sizes.
Conclusion: This study refines dependency analysis in bivariate binary data, enhancing accessibility for researchers.
Keywords
Longitudinal binary endpoints
generalized linear models
repeated measures
Various anthropometric indices have been proposed to assess central obesity and predict metabolic syndrome (MetS). This presentation aimed to compare the predictive potential of anthropometric indices for MetS and its components. Among Nepalese adults, the Visceral Adiposity Index (VAI) and Lipid Accumulation Product (LAP) outperformed traditional measures such as Body mass index (BMI), waist-to-hip ratio (WHR), and waist-to-height ratio (WHtR) in predicting MetS and its components. Optimal cutoffs were as follows: VAI > 1.97 (females), > 2.16 (males); LAP > 53.4 (both sexes); WHR > 0.98 (both sexes); WHtR > 0.638 (females), > 0.56 (males); BRI > 5.76 (females), > 4.75 (males). ABSI and BAI exhibited the poorest diagnostic performance for MetS prediction in both sexes.
Keywords
Anthropometric indices
Metabolic Syndrome
ROC curve
Sensitivity
Specificity
How often do statisticians get to work on ancient pottery data from a 14th century archeological site in Greece?
I had the opportunity to collaborate with a group of archeologists to mine data on ancient ceramic vessels, which were retrieved from a sealed well deposit found within the archeological site. A model-based cluster analysis method, Gaussian Mixture Models Clustering, was applied to vessel dimensions to identify clusters, and tested stability of clusters using a series of non-parametric testes. The clusters were used to verify the morphology of the ceramic vessels conforming to the standard archeological vessel shapes identified by archeologists. This presentation will discuss the statistical modeling and the results, in application to uncovering clusters in the ancient ceramic vessel data.
Keywords
Archeology
Gaussian Mixture Model
Model Based Clustering
Vessel Morphology
Meta-analysis is a statistical technique to combine and summarize prior quantitative studies to assess the impact of a specific subject or intervention. Synthesizing these meta-analyses can determine the consistency and robustness of findings across different populations and settings. Synthesis analysis is one such application, which is a multivariable meta-analysis that estimates the relationship between multiple predictors and an outcome variable. However, this method has only been applied to linear and logistic models. Survival analysis, which focuses on time-to-event data, offers critical insights into the timing of events such as disease progression or treatment efficacy. Extending synthesis analysis to survival data is a novel meta-analytic approach that allows for a more comprehensive synthesis of public health studies. The extension aims to improve risk estimation, statistical power and reduce biases while optimizing temporal, labor, and financial efficiencies, focusing on non-communicable diseases like cardiovascular disease, diabetes, and cancer. This paper provides a comprehensive review of existing synthesis analyses, guiding their application to survival outcomes.
Keywords
Meta-analysis
Synthesis analysis
Prediction model
Multivariable analysis
Survival outcome
Non-communicable disease
Item Response Theory (IRT) has long been a cornerstone of educational testing, enabling accurate measurement of student ability across diverse types of assessments. Recently, these models have also shown promise in healthcare, capturing latent traits like quality of life, patient satisfaction, and symptom severity. In this work, we present a flexible approach to IRT accommodating multiple item types (e.g., dichotomous, polytomous) and leveraging modern computational methods for parameter estimation. We introduce our open-source Python package IRTorch, which streamlines model building and parameter estimation while offering robust tools for handling large-scale datasets. We demonstrate how these models handle complex response structures in Swedish SAT data and patient-reported outcomes on stroke recovery from the Swedish Stroke Register. We also highlight key insights for practitioners, including guidelines for model selection, diagnostics, and handling missing or noisy data. These findings underscore the broad applicability of modern IRT methods for quantitative research across domains, leading to more nuanced and actionable insights in both education and healthcare.
Keywords
Item Response Theory
Psychometrics
Healthcare
Statistical software
PyTorch
Interpreting real-time data from wearable devices, such as continuous glucose monitors (CGM), to inform long-term adverse event risk is a central objective of digital health and precision medicine. We address a gap in existing regression-based methods for modeling scalar responses with functional predictors by developing a generalized functional linear model for a right-censored scalar response that incorporates both functional and scalar covariates. We consider a direct binomial model in which a binary outcome indicates the survival of a subject past a fixed time horizon. We approximate the random functional predictors using a truncated Karhunen-Loève expansion, with the truncation parameter permitted to increase with sample size. Inverse probability of censoring weights are used to obtain unbiased effect size estimates in the presence of censoring. By establishing asymptotic normality, we construct confidence intervals for both the scalar coefficients and the parameter function. We illustrate our method by modeling the survival probability of over 2,000 veterans with type 2 diabetes using CGM data and their baseline scalar characteristics.
Keywords
functional regression
right censoring
generalized linear model
digital health
wearable devices
To predict exact life expectancy is needed to plan patient's future in palliative care. The aim of this study is to apply multiple machine learning models to achieve highly accurate prediction and to consider factors that influence functional and life prognosis. Three types of functional time predictions for walking, eating, and communicating, and life time prediction was analyzed. Functional and life time prediction were analyzed using four models: decision tree, LASSO regression, random forest, and XGBoost. None of the models achieved high accuracy in each prediction. The feature importance of each model showed different characteristics when comparing each prediction and model. RMSE of LASSO regression, random forest, and XGBoost were about 7 days for each functional time prediction and about 6 days for life time prediction. In this study, the survival period was limited to 30 days or less, so this error is considered to be very large for patients. The feature importance showed that laboratory data was important for each prediction. In the prediction using machine learning, not all models achieved high accuracy. However, very useful results were obtained from feature importance.
Keywords
palliative care
machine learning
decision tree
LASSO
random forest
XGBoost
Without access to healthy food, preventing illnesses like diabetes is difficult. This access can be quantified for an area by measuring its distance to the nearest grocery store, but there is a trade off. We can either measure a more accurate but expensive distance only using passable roads or an error-prone but easy-to-obtain straight-line metric ignoring infrastructure and natural barriers. Fitting a standard regression model to the relationship between disease prevalence and error-prone food access would introduce bias, but fully observing the more accurate measure is often impossible, creating a missing data problem. We address these challenges by deriving a new maximum likelihood estimator for Poisson regression with a binary, error-prone exposure where the errors may depend on additional error-free covariates. Via simulation, we show the consequences of ignoring the error and how the proposed estimator corrects for that bias while preserving more statistical efficiency than the complete case analysis. Finally, we apply our estimator to data from the Piedmont Triad in North Carolina, where we model the relationship between diabetes prevalence and access to healthy food.
Keywords
Grocery Stores
Maximum Likelihood Estimation
Measurement Error
Missing Data
One-Sided Misclassification
Poisson Regression
Machine learning (ML) can increase discriminatory value in risk assessment tools compared to traditional regression. We explored the performance of ML models, compared to a previously derived logistic regression model (area under the curve [AUC]=0.77, 10 variables), for predicting all-cause mortality within 60 days post-discharge among neonates from two national referral hospitals in sub-Saharan Africa.
In a prospective cohort of 2,294 neonates (3% mortality rate), data were randomly split (80% training, 20% testing). We addressed class imbalance with Synthetic Minority Oversampling and selected variables via minimum-Redundancy maximum-Relevance. We trained random forest, XGBoost, hist gradient boosting, support vector machine (SVM), and neural network models, optimizing hyperparameters via 5-fold cross-validation.
Hist gradient, random forest, and XGBoost achieved AUCs of 0.99 with six variables. Neural network (AUC=0.97) required eight, and SVM (AUC=0.89) required 17 but was computationally heavy. ML models outperformed logistic regression (p<0.001). Selecting parsimonious, high-accuracy, low-cost models are key for feasible clinical implementation.
Keywords
Machine learning
Prediction modeling
Logistic regression
Model performance
Risk prediction
Co-Author(s)
Chris Rees, Emory University School of Medicine; Children’s Healthcare of Atlanta
Rodrick Kisenge, Muhimbili University of Health and Allied Sciences
Evance Godfrey, Muhimbili University of Health and Allied Sciences
Readon Ideh, John F. Kennedy Medical Center
Julia Kamara, John F. Kennedy Medical Center
Ye-Jeung Coleman-Nekar, John F. Kennedy Medical Center
Abraham Samma, Muhimbili University of Health and Allied Sciences
Hussein Manji, Muhimbili University of Health and Allied Sciences; The Aga Khan Health Services
Christopher Sudfeld, Harvard T.H. Chan School of Public Health
Michelle Niescierenko, Boston Children’s Hospital; Harvard Medical School
Claudia Morris, Emory University School of Medicine; Children’s Healthcare of Atlanta
Todd Florin, Ann & Robert H. Lurie Children's Hospital of Chicago
Christopher Duggan, Harvard T.H. Chan School of Public Health; Boston Children’s Hospital
Karim Manji, Muhimbili University of Health and Allied Sciences
Rishikesan Kamaleswaran, Department of Biostatistics and Bioinformatics, Duke University
First Author
Adrianna Westbrook, Emory University
Presenting Author
Adrianna Westbrook, Emory University
MetaScope is a novel R package designed for the rapid, accurate taxonomic profiling of metagenomic and 16S sequencing reads. MetaScope addresses a critical need for efficient and precise microbial composition analysis. Its core modules are MetaRef, which builds reference genome sequence libraries, MetaAlign, which aligns reads to the target library using Bowtie 2 or Subread aligners, MetaFilter, which filters reads that align to the host library, and MetaID, which reassigns ambiguously mapped reads to their likely genome of origin using a Bayesian model. MetaScope also offers demultiplexing and output aggregation modules to enhance functionality and integrates with the animalcules R package for downstream microbiome analysis. A novel feature is the complementary coverage plots in the MetaID module, enabling additional quality checking and improved post-processing. We evaluated MetaScope's performance with benchmarking against mock microbial communities using 16S datasets. These results demonstrate that MetaScope achieves strain-level differentiation capabilities and demonstrates high sensitivity compared to other 16S profilers.
Keywords
Bayesian
Metagenomics
Microbiome
Microbial Profiling
Genomics
Microtiter plate formats are a standard tool in laboratory experiments, allowing scientists to investigate physical, chemical, and biological reactions of test articles in various assays. We investigated data from a 384-well in-vitro study involving 18 test articles , which included 13 mixtures and an active product constituent, along with positive, negative, controls (e.g., vehicle controls). The experiment was conducted using two cell types, and two assays, with multiple replicates. Test articles were dosed in 10 concentrations in duplicate, spaced at equal log intervals. Despite normalization to vehicle controls, marked plate-to-plate variability was observed. Dose response curves were fitted for each replicate using the tcplfit2 library in R, selecting the best fitted model based on the lowest AIC. We focused on benchmark dose concentration as a key endpoint of the fitted curve. We applied a mixed-effects model with plate as a random effect to account for the observed plate-specific variability. This modeling approach provides a framework for addressing plate variability in dose response studies, enhancing reproducibility and accuracy.
Keywords
Mixed effect model
in-vitro experiment
dose response modeling
Toxicology
Cell-based assays
Longitudinal tumor growth studies serve a foundational role in preclinical therapeutic evaluation, acting as precursors to human clinical trials. Despite the prevalence of these experiments, there is little consensus on how best to analyze the resulting data, largely due to underemphasized data challenges such as non-linearity, censoring and correlated errors. We capitalize on common design characteristics to develop a composite, prioritized estimator that is interpretable as well as robust to several of these data challenges. To provide a platform for identifying treatment synergy or dose toxicity, the semi-parametric proportional odds model is proposed to extend our estimator to the regression setting. We develop an algorithm to maximize a quasi-conditional likelihood, allowing us to avoid the estimation of N-1 nuisance parameters. Finally, we show how a time-dependent win ratio can be used to extend our method to the case of clustered data, where one animal may have several tumors under study. Closed form cluster-correct variance calculations are provided. The implementation of the methods are demonstrated on several HPV+ head and neck squamous cell carcinoma xenograft models.
Keywords
Win ratio
Composite
Semi-parametric
Preclinical
Proportional odds
Growth models
About 90% of human cancer deaths are due to metastasis. To date, immune checkpoint inhibitors (ICIs) are one of the frontier treatments that have improved the survival of metastatic cancer patients with few side effects. However, the objective response rate for ICIs is low, only ~30% in urothelial carcinoma (UC), highlighting the need to identify signatures for response prediction. Several state-of-the-art signatures have been revealed in first-tier journals, demonstrating the area's importance. As the number of genes (features; ~20,000) greatly exceeds the sample sizes of training sets (≤300), we first developed feature selection procedures to reduce features to a few hundred. Next, we trained several classifiers using Imvigor210 and the selected genes, comprising RNA-seq and clinical data of ~298 patients with mUC, via 5-fold cross-validation. In particular, our predictor based on logit regression (LogitDA) with the revealed signature achieved a prediction AUC of 0.75; our signature outperformed the known signatures (e.g., PD-L1, PD-1, the IFNG, tGE8, T exhaust, and T inflamed). Overall, our findings show that LogitDA and our signature predict immunotherapy response well in mUC.
Keywords
biomarker
cancer
machine learning
regression
prediction
Linear regression is a topic that typically is part of a statistical education. One application of linear regression in science and engineering is through calibration curve modeling, for example, in chemistry. When creating a calibration curve, the technician creates multiple replicates of the response at fixed values of the predictor. Then, a technique such as least squares is utilized to estimate the calibration curve. This curve is estimated with error, where the error is utilized for other parts of the calibration analysis. Although not a recommended practice, sometimes the calibration curve is then fit utilizing averages of the response, instead of the original observations of the response. We discuss how to explore the differences in these approaches visually, through simulation, and theoretically with STEM students.
First Author
Megan Heyman, Rose-Hulman Institute of Technology
Presenting Author
Megan Heyman, Rose-Hulman Institute of Technology
Health communication through data visualization is one of the most important skills public health professionals should have to help their communities effectively. However, rigorous training on data visualization and statistics is rare for public health degree programs across the country. There is a critical need to account for graphic literacy levels in the general population for effective communication of complex health issues. We performed a cross-sectional exploratory study to assess the graphic literacy of a nationally representative sample (N=524) in the United States and preference for data visualization types applied to COVID-19 variant proportion data. Results showed that graphic literacy levels, as measured by the Short Graphic Literacy Scale, were lower than previously measured. Those with higher graph literacy were more likely to select bar and pie charts, while those who were not were more likely to select other chart types. These findings significantly contribute to the development of educational strategies for effective health communication for public health students, enabling them to combat misinformation and reduce health disparities among disadvantaged populations.
Keywords
Statistical Communication
Data Visualization
Health Communication
Public Health
Education in the Health Sciences
Proliferative Diabetic Retinopathy (PDR), the advanced stage of diabetic retinopathy (DR), causes abnormal retinal vessel growth and vision loss. Accurately identifying incident PDR in electronic health records is important for disease monitoring and evaluating interventions. This study evaluates classification methods for identifying incident PDR cases, using the UCSF De-identified Clinical Data Warehouse. Patients aged ≥ 18 with at least one DR diagnosis by an eye provider and available de-identified clinical notes were included. 321 patients were randomly selected for chart review by an ophthalmologist (gold standard), confirming 158 PDR cases. Six methods were evaluated: first ICD9/10 code with no lookback period, first ICD9/10 with a one-year lookback period in any department, first ICD9/10 with a one-year lookback period in ophthalmology, rule-based NLP on clinical notes, best-performing ICD9/10 method with NLP, and a generative AI model. Each method will be compared against the gold standard using sensitivity, specificity, PPV, NPV, and F1 score. The proposed methodologies will provide insights into the use of structured and unstructured data for identifying incident PDR.
Keywords
Electronic Health Records (EHR)
Ophthalmology
Incident Disease