Contributed Poster Presentations: Biometrics Section

Shirin Golchi Chair
McGill University
 
Monday, Aug 4: 10:30 AM - 12:20 PM
4048 
Contributed Posters 
Music City Center 
Room: CC-Hall B 

Main Sponsor

Biometrics Section

Presentations

01: A Simulation Method for Sample Size Determination in Calibration Studies of Prognostic Biomarkers

Evaluating the calibration of prognostic biomarkers is crucial for assessing how well predicted risks match observed event rates. A common approach bins predicted risks and compares the observed event rates in each bin against the predicted rates. However, small sample sizes can yield wide confidence intervals (CIs), which can obscure whether observed deviations reflect actual miscalibration or random variation.

A simulation framework was developed to determine the necessary sample size for reliable estimation of bin-specific event rates. We leverage assumptions on the marginal distribution of biomarker risk groups (RGs), the conditional bin distribution per RG, and event rates within RGs. Using a Dirichlet‐multinomial process, individuals are assigned to a RG, then randomly allocated to bins conditional on the RG assignment. Using an exponential survival model, event times are generated to calculate the bin‐level survival estimates and CIs at a fixed time point.

The method provides practical guidance for choosing sample size to ensure robust calibration assessments using an adaptable approach towards different bin schemes and biomarker assumptions. 

Keywords

Calibration

Sample Size

Biomarker

Survival Analysis

Simulation

Confidence Interval 

Abstracts


Co-Author(s)

Yi Ren
Huei-Chung Huang, ArteraAI

First Author

Alexander Piehler, ArteraAI

Presenting Author

Alexander Piehler, ArteraAI

02: Analysis of Analytic Treatment Interruption Trials with Misclassified Interval-Censored

Analytical treatment interruption (ATI) trials involve interruption of antiretroviral therapy (ART) in order to evaluate the efficacy of novel treatments for HIV. In these trials, the primary outcome of interest is often the time to viral rebound: the time from ART interruption until the participant's viral load crosses a predetermined threshold. Due to the discrete nature of clinical visit schedules, this time to viral rebound is interval-censored, though many ATI studies implicitly right-impute the failure time. Furthermore, measurement error and the non-monotonicity of the HIV viral load trajectory may lead the first threshold-crossing event to be missed entirely by the intermittent viral load measurements. We conduct a simulation study to evaluate the performance of the Cox proportional hazards model and log-rank test on misclassified interval-censored data, both with and without right imputation of the rebound time. We also investigate how aspects of the ATI trial design may mediate the impact of these data characteristics on the Cox model and log-rank test performance. Finally, we make recommendations based on our results for best practices for ATI trials. 

Keywords

HIV/AIDS

clinical trials

interval censoring

misclassification

Cox proportional hazards model

log-rank test 

Abstracts


Co-Author

Kaitlyn Cook, Smith College

First Author

Skyler Williams, Smith College

Presenting Author

Skyler Williams, Smith College

03: Assessing the Robustness of AR Models in the Presence of Non-normality: A Simulation Study

In time series modeling, it is common to assume that innovations follow a normal distribution. However, this assumption does not always hold in real-world scenarios. Environmental datasets, in particular, often contain extreme values that violate normality. Through a comprehensive simulation study, we demonstrate that traditional AR(q) models can produce inaccurate results when innovations deviate from normality, especially when they exhibit skewness. Our findings highlight that outliers can distort estimates, introduce bias, and compromise the generalizability of results. 

Keywords

Autoregressive Models

Robustness

Outliers

Skew distributions 

Abstracts


Co-Author

Evrim Oral, LSUHSC School of Public Health

First Author

Mohamed Mohamed, Louisiana State University Health Science Center

Presenting Author

Mohamed Mohamed, Louisiana State University Health Science Center

04: Autoregressive Linear Mixed Effects Models and Dynamic Models for Longitudinal Data Analysis

Longitudinal data and panel data are both obtained by repeatedly measuring a certain response variable over time for multiple subjects, and analytical methods for these data have been developed separately in several disciplines. In recent years, methodological integration is occurring. Here, we focus on dynamic models in which the previous response value, that is the lagged dependent variable, appears on the right-hand side of the equation. We compare our proposed autoregressive linear mixed effects models (Funatogawa et al., 2007; Funatogawa and Funatogawa, 2019) with similar dynamic models used in several fields. Our proposed model is an extension of the linear mixed effects model by combining autoregression with it, and has been developed with the main aim of expressing changes in responses over time. We have also provided the state-space representation and the relationships with nonlinear mixed effects models. On the other hand, in panel data analysis, which is used in observational studies, stable unobservable individual characteristics and the lagged dependent variable are often used for adjustments purposes. In this study, we focus on likelihood-based methods. 

Keywords

Autoregressive

Dynamic

Longitudinal

Panel Data 

Abstracts


Co-Author

Takashi Funatogawa, Chugai Pharmaceutical Co Ltd.

First Author

Ikuko Funatogawa, Institute of Statistical Mathematics

Presenting Author

Ikuko Funatogawa, Institute of Statistical Mathematics

05: Bayesian Hierarchical Borrowing for Platform Trials with Non-Linear Longitudinal Outcomes

Platform trials are multi-arm designs that simultaneously evaluate multiple treatments for a single disease under a shared protocol, benefiting from control data borrowing to improve statistical efficiency. Longitudinal outcomes can provide more precise estimates and increase statistical power for platform analysis. However, relatively few studies have addressed borrowing approaches for longitudinal outcomes in platform trials. In pain and depression studies, outcome trajectories are often nonlinear. To address these issues, we extend Bayesian hierarchical borrowing methods (BHM) to longitudinal endpoints, incorporating nonlinear features within a causal inference framework. We investigate the performance and benefits of various pooling methods: simple pooling, eligible pooling, BHM, and incorporating flexible and adaptable patient-level weights in BHM, in comparison to no borrowing methods. The BHM framework introduces hierarchical structures to balance the extent of borrowing based on data similarity across regimens, optimizing inference while maintaining Type I error control. Simulation results will be presented to compare different borrowing approaches. 

Keywords

Bayesian Hierarchical Borrowing

platform trials

Longitudinal

non-linear 

Abstracts


Co-Author

Thaddeus Tarpey, NYU Grossman School of Medicine

First Author

Xiaoting Xing, NYU Grossman School of Medicine

Presenting Author

Xiaoting Xing, NYU Grossman School of Medicine

06: Bayesian Sparse Regression for the Association of Microbiome Profiles with Metabolite Abundance

Numerous studies have shown that microbial metabolites, which represent the products of bacteria in the human gut, play a key role in shaping cancer risk and response to treatment. However, metabolite data typically contain a large proportion of missing values which are often recorded as zeros. These missing values may result from either low abundance or technical challenges in data processing. Moreover, given the compositionality of microbiome data, where the observed abundances can only be interpreted on a relative scale, standard variable selection methods are not applicable. In this project, we propose a novel Bayesian method to address challenges in both metabolite and microbiome data. Key features of our proposed model include adopting a z-prior to address the compositional characteristics of microbiome data and modeling the two different mechanisms of missing metabolite data. We demonstrate on simulated data that our proposed model can impute the unobserved true metabolite values and correctly select the relevant microbiome predictors. We illustrate our method on real data from a study on the interplay between the microbiome and metabolome in colorectal cancer. 

Keywords

Bayesian variable selection

Compositional covariates

Metabolome outcome

Microbiome data analysis

Missing value imputation 

Abstracts


Co-Author

Christine Peterson, University of Texas MD Anderson Cancer Center

First Author

Kai Jiang, The University of Texas Health Science Center as Houston

Presenting Author

Kai Jiang, The University of Texas Health Science Center as Houston

07: Boon of Dimensionality in Bayesian Heritability Estimation

In the frequentist framework, Jiang et al. (2016) established the asymptotic properties of the restricted maximum likelihood (REML) estimator under misspecified linear mixed models (LMMs), demonstrating the consistency of the REML estimator for heritability. Our study extends these results to the Bayesian paradigm by considering a non-informative prior on the error variance. We derive the Bayesian marginal maximum likelihood estimator (MMLE) for the signal-to-noise ratio (SNR) and analyze its concentration properties.

Our analysis establishes that the Bayesian MMLE exhibits asymptotic consistency properties analogous to those of the REML estimator. Furthermore, we derive non-asymptotic convergence rates for the Bayesian MMLE, elucidating its behavior under model misspecification, particularly in high-dimensional settings. These results have direct implications for variable selection, uncertainty quantification in hierarchical models, and signal detection in complex data structures. 

Keywords

Bayesian estimation

Restricted Maximum Likelihood Estimator (REML)

Model Misspecification

Signal-to-Noise Ratio (SNR)

Marginal Maximum Likelihood Estimator

Asymptotic Consistency 

Abstracts


Co-Author(s)

Quan Zhou, Texas A&M University
Anirban Bhattacharya, Texas A&M University

First Author

Sayantan Roy, Texas A&M University

Presenting Author

Sayantan Roy, Texas A&M University

08: Clustering Multivariate Discrete Data with Partial Records

Being able to cluster data with incomplete records is vital in many disciplines. Here, we develop a model-based clustering approach for clustering multivariate discrete data with missing entries using a mixture of multivariate Poisson lognormal distributions. A multivariate Poisson lognormal distribution is a hierarchical Poisson distribution that can account for over-dispersion and can model the correlation between the variables. To illustrate the effectiveness of this method, we have designed a variety of simulation studies to show the robustness of this new method under different percentages of incomplete records and patterns of missing data. Additionally, the approach is used to demonstrate clustering partial records from a proteomics dataset. 

Keywords

Clustering

Missing Data

Discrete Data

Multivariate Poisson Log Normal Distribution 

Abstracts


Co-Author(s)

Utkarsh Dang, University of Guelph
Sanjeena Dang, Carleton University

First Author

Kevin Giddings

Presenting Author

Kevin Giddings

09: Clustering of functional data prone to complex heteroscedastic measurement error

Several factors make clustering functional data challenging, including the infinite dimensional space to which observations belong and the lack of a defined probability density function for functional random variables. Despite extensive literature describing clustering methods for functional data, clustering of error-prone functional data remains poorly explored. We propose a two-stage approach: first, clustered mixed-effects models are applied to adjust for measurement-error bias; second, cluster analysis is applied to measurement error–adjusted curves. Readily available methods (e.g., K-means, mclust) can be used to perform the cluster analysis. We use simulations to examine how complex heteroscedastic measurement error affects clustering, considering variations in sample sizes, error magnitudes, and correlation structures. Our results show that ignoring measurement error in functional data reduces the accuracy of identifying true latent clusters. When applied to a school-based study of energy expenditure among elementary school–aged children in Texas, our methods achieved enhanced clustering of energy expenditure. 

Keywords

clustering

functional data

measurement error

physical activity

wearable device 

Abstracts


Co-Author(s)

Lan Xue, Oregon State University
Roger Zoh, Indiana University
Mark Benden, Texas A&M University
Carmen Tekwe, Indiana University

First Author

Andi Mai

Presenting Author

Andi Mai

10: Compared to what?: Statistical considerations for evaluating change in the urinary microbiome

The discovery of the urinary microbiome has ignited investigations into the mechanisms of disease and potential interventions for improved bladder health. Characterizing a patient's "baseline" is key to properly evaluate change due to exposures and treatments such as probiotics, antibiotics, and surgical intervention. In our previously published prospective observational study, we found that urological surgery altered the urinary microbiome, with differences in recovery to baseline in premenopausal versus postmenopausal women. This study is a secondary analysis of these data, capitalizing on additional samples to describe assay variability, evaluate stability across multiple available baseline samples (screening, pre-operative), and estimate the minimal detectable change in key microbiome features (diversity indices and prevalence/relative abundance of specific microbes). These data can inform those conducting longitudinal clinical studies in this field, where for convenience a urine sample at a single timepoint is most often collected to establish a baseline. Power analyses and sampling design should account for expected variability of the dynamic ecosystem in the bladder. 

Keywords

microbiome

clinical research 

Abstracts


First Author

Cara Joyce

Presenting Author

Cara Joyce

11: Estimating Obesity's Effect on Chronic Disease Using a Copula Model with Device-Measured Activity

Analyzing the effect of obesity on chronic disease risk is challenging due to endogeneity, measurement error, and complex dependencies between obesity, physical activity, and health outcomes. Standard statistical methods, such as generalized linear model, often fail to adequately address these issues, leading to biased estimates. To overcome these limitations, we develop a bivariate semi-parametric recursive copula model that flexibly accounts for non-linear relationships and intricate dependency structures. We evaluate the finite sample properties of our approach through simulation studies and apply it to NHANES 2011–2014, incorporating device-measured physical activity to enhance estimation accuracy. Results confirm the robustness of our method and reinforce the causal association between obesity and chronic disease risk. This study highlights the importance of advanced statistical techniques for improving average treatment effect (ATE) in epidemiological research. 

Keywords

Obesity

cardiovascular disease

semi-parametric recursive copula model

endogeneity

physical activity

diabetes 

Abstracts


Co-Author(s)

Roger S Zoh, Indiana University
Carmen Tekwe, Indiana University

First Author

Xiaoxin Yu

Presenting Author

Xiaoxin Yu

12: Estimation of Heterogeneous Causal Mediation Effects in the Presence of High Dimensional Covariates

Understanding treatment effects through biological pathways is an essential objective in biomedical investigation. Causal mediation analysis (CMA) provides a useful framework for such inquiries. However, the natural direct effect (NDE) and natural indirect effect (NIE) may depend on specific patient characteristics. To account for such heterogeneity, we include covariate-treatment and mediator-treatment interactions in the outcome model. We relax the strict hierarchical constraint by including interactions without requiring the corresponding main effects. NDE and NIE are then calculated for given values of the covariates. To maintain model parsimony in the presence of high dimensional covariates, we apply generalized LASSO regularization to select key covariate-treatment interactions. Simulation studies show that the method has good performance in selecting the interactions. The method can properly stratify individuals and achieve unbiased estimates for the NDE and NIE. The method represents a step forward in understanding the heterogeneity in the mediation pathway of the treatment within personalized medicine. Data from a real clinical study were used to illustrate the method. 

Keywords

Causal Mediation Analysis

Heterogeneous Treatment Effects

Generalized LASSO

High-Dimensional Covariates

Natural Direct and Indirect Effects

Personalized Medicine 

Abstracts


Co-Author(s)

Yi Zhao, Indiana University School of Medicine
Wanzhu Tu, Indiana University School of Medicine

First Author

Chengyun Li

Presenting Author

Chengyun Li

13: Evaluating ML Approaches for Assessing Heterogeneity of Treatment Effect in Clinical Trials

Inferring heterogeneity of treatment effect is a popular secondary aim of clinical trials. While there are several methods available to estimate conditional average treatment effects (CATEs) in clinical trials, they are often applied in settings with lower sample sizes than were included in corresponding seminal methodological work, making the validity of inference in these settings unclear. To provide practical guidance, we conducted a simulation study to evaluate the performance of different estimators for the CATE, including ordinary least squares (OLS) and causal forests, in a variety of settings. We evaluated 95% confidence interval coverage, bias, and variance under linear and non-linear data generating mechanisms (DGM) in the presence of 0-40 nuisance covariates and 0-16 effect modifying covariates. We found that while tree-based ensembles like causal forests can be quite flexible to linear or nonlinear settings, they can have meaningfully impaired coverage in many settings at sample sizes which constitute most trial applications. As expected, OLS has superior performance under linear DGMs but poor performance under nonlinear DGMs. We conclude with recommendations. 

Keywords

heterogeneous treatment effects

causal forests

machine learning

simulation

causal inference 

Abstracts


Co-Author(s)

Andrew Spieker, Vanderbilt University Medical Center
Bryan Blette, Vanderbilt University Medical Center

First Author

Lisa Levoir, Vanderbilt University

Presenting Author

Lisa Levoir, Vanderbilt University

14: External data sources and their utilization in the clinical environment: past, present and future

In the past couple of years, the Clinical Research outlook has evolved, including rising costs for developing new treatments. This has led the industry to look for new innovations to help alleviate these costs, without the risk of losing scientific reliability. A few that I will be looking at, but not limited to, will include real-world data / real-world evidence and natural history studies. These discussions will include the benefits, pitfalls and unique considerations associated with each of these, as well as the stance that regulatory authorities currently have towards them. Each of these new innovations include their own different statistical methodology considerations to take into account. I will also look at some real examples where such sources were used in the industry thus far (the good, the bad and the ugly). Lastly, I will discuss what this could imply for the future and the practicing statistician.  

Keywords

Real-World Data / Real-World Evidence

Purpose and utilization of natural history studies (i.e. external controls, and retrospective and prospective studies).

Sensitivity/covariate analysis between external data and recruited controls.

Doubly debiased machine learning to estimate average treatment effect (ATE)

The stratification score estimated probability of the outcome given potential confounders matching of external controls to mimic internal controls. 

Abstracts


First Author

Ian Lees, MMS Holdings, Inc

Presenting Author

Ian Lees, MMS Holdings, Inc

15: Flexible Individualized Treatment Strategies in Micro Randomized Trials with Binary Rewards

Micro-randomized trials (MRTs) are often used in mHealth studies to assess app-based interventions. Participants are randomized to receive treatment at a series of decision points, traditionally using the same rule across individuals. Several recent MRTs utilize Thompson Sampling (TS), a reinforcement learning algorithm, to build individualized treatment strategies that optimize delivery with respect to a reward. Treatment may interact with several contextual features, but estimation of models in this setting can be unreliable. This is especially difficult with a binary reward where complete separation often occurs, even with a large sample and few features. We present an approach to balance algorithmic flexibility and computational cost in the context of a binary reward that (1) uses partial pooling and weakly informative priors that apply more shrinkage to higher-order interactions and (2) considers the amount of information available in the data when defining a model. Our approach is useful in MRTs where the TS algorithm must be automated. We demonstrate the empirical utility of our method in a digital twin of an ongoing MRT study, LowSalt4Life, compared to logical alternatives. 

Keywords

Mobile health

Micro-randomized trials

Clinical trials

Reinforcement learning

Individualized treatment 

Abstracts


Co-Author(s)

Rachel Gonzalez, University of Michigan
Walter Dempsey, University of Michigan
Scott Hummel, University of Michigan
Brahmajee Nallamothu, University of Michigan
Michael Dorsch, University of Michigan

First Author

Madeline Abbott

Presenting Author

Rachel Gonzalez, University of Michigan

16: Martingale R-learner: estimating time-varying heterogeneous treatment effects for survival data

Future precision medicine requires accurate assessment on the explainable variability in treatment effects, known as heterogeneous treatment effects (HTE), to guide the optimal clinical decision at individual level. Measuring HTE by the ratio of survival probabilities under structural failure time model, we develop a martingale R-learner to estimate HTE. Our martingale R-learner incorporates flexible estimators for 1) marginal survival or cumulative hazards for association between outcome and confounders, and 2) time-varying propensity score in risk sets, which enables leveraging advances in machine learning. To reduce the impact of estimation bias in these two nuisance models on HTE, we proposed a Neyman orthogonal score based on an orthogonal decomposition of conditional model martingale residuals into residuals of propensity score and marginal model martingale. The resulting martingale R-learner attains the quasi-oracle property, i.e. estimation error of nuisance models have no impact on HTE if their estimators are consistent at o(n^(-1/4)) rate. Numerical experiments in various settings demonstrated valid empirical performance consistent with theoretical properties. 

Keywords

heterogeneous treatment effect

causal inference

survival analysis

orthogonal score 

Abstracts


Co-Author(s)

Jue Hou
Ronghui Xu, University of California-San Diego

First Author

Yuchen Qi, UC San Diego, Department of Family Medicine & Public Health

Presenting Author

Yuchen Qi, UC San Diego, Department of Family Medicine & Public Health

17: Novel Approaches for Random-Effects Meta-Analysis of a Small Number of Studies Under Normality

Random-effects meta-analyses with only a few studies often face challenges in accurately estimating between-study heterogeneity, leading to biased effect estimates and confidence intervals with poor coverage. This issue is especially the case when dealing with rare diseases. To address this problem for normally distributed outcomes, two new approaches have been proposed to provide confidence limits of the global mean: one based on fiducial inference, and the other involving two modifications of the signed log-likelihood ratio test statistic in order to have improved performance with small numbers of studies. The performance of the proposed methods was evaluated numerically and compared with the Hartung-Knapp-Sidik-Jonkman (HKSJ) approach and its modification for handling small numbers of studies. Simulation results indicated that the proposed methods achieved coverage probabilities closer to the nominal level and produced shorter confidence intervals compared to those based on existing methods. Two real data examples are used to illustrate the application of the proposed methods. 

Keywords

confidence interval

fiducial inference

modified LRT statistic

small sample asymptotics

rare diseases 

Abstracts


Co-Author(s)

Thomas Mathew, University of Maryland-Baltimore
Demissie Alemayehu, Pfizer
Ge Cheng

First Author

Yajie Duan, Eli Lilly and Company

Presenting Author

Yajie Duan, Eli Lilly and Company

18: Propensity Score-Based Stratified Win Ratio for Augmented Control Designs

This project proposes a propensity score (PS)-based stratified win ratio method to address challenges of small patient populations in clinical trials, especially for rare or pediatric diseases, by incorporating external control data. Our approach enhances traditional win ratio analysis by leveraging PS stratification to account for heterogeneity between the current and external studies. Additionally, down-weighting based on the overlapping coefficient of PS distributions of current treatment and external control groups further mitigates the patient bias due to heterogeneity. Our simulations showed significant improvements in statistical power for detecting treatment effects within the composite endpoint, over non-borrowing and pooling methods, with utilizing Mantel-Haenszel (MH)-type weights achieving the highest power. The proposed methods are also applied to an Amyotrophic Lateral Sclerosis (ALS) study incorporating the external control arm from a prior ALS trial. The proposed PS-based stratified win ratio method thus provides a rigorous framework for borrowing external data and analyzing composite endpoints with limited patient availability. 

Keywords

Placebo borrowing

Win ratio

Composite endpoint analysis

Propensity score stratification 

Abstracts


Co-Author(s)

Joon Jin Song, Baylor University
Yingdong Feng, Eli Lilly and Company
Michael Sonksen
Tuo Wang, Eli Lilly and Company

First Author

Yurong Chen, Baylor University

Presenting Author

Yurong Chen, Baylor University

19: Real Data-Driven, Robust Survival Analysis on Patients with Parkinson's Disease

Parkinson's Disease (PD) is a devastating neurodegenerative disorder that affects millions of people around the globe. Many researchers are continuously working to understand PD and develop treatments to improve the condition of PD patients that affects their day-to-day lives. In the last decades, the treatment of Deep Brain Stimulation (DBS) has given promising results for motor symptoms by improving the quality of daily living of PD patients. In the methodology of the present study, we have utilized sophisticated statistical approaches such as Nonparametric, Semiparametric, and robust Parametric survival analysis to extract useful and important information about the long-term survival outcomes of the patients who underwent DBS for PD. Finally, we were able to conclude that the probabilistic behavior of the survival time of female patients is statistically different from that of male patients. Furthermore, we have identified that the probabilistic behavior of the survival times of Female patients is characterized by the 3-parameter Lognormal distribution while that of Male patients is characterized by the 3-parameter Weibull distribution. 

Keywords

Survival Analysis

COX-PH

Deep Brain Stimulation

Movement Disorder

Parkinson’s Disease

Parametric, Non-Parametric and Semi-Parametric Survival Analysis 

Abstracts


Co-Author(s)

Dilmi Abeywardana
Chris Tsokos, Distinguished University Professor-USF

First Author

Malinda Iluppangama

Presenting Author

Malinda Iluppangama

20: The Kernel Regression Tree-Exploring Aggregations Estimator for Microbiome Analysis

We introduce a novel Kernel Regression estimator, Kernel Regression with Tree-Exploring Aggregations (KR TEXAS), that learns a distance metric while allowing feature aggregation along a predefined tree structure. This approach is particularly relevant for microbiome analysis, where data is often collected at multiple taxonomic levels, and determining the appropriate level of aggregation is non-trivial. Unlike traditional aggregation methods that rely on uniform taxonomic levels, KR TEXAS leverages an L1-penalized distance metric to selectively aggregate features based on their importance, leading to biologically interpretable results. Our method extends prior work on metric learning and nonparametric regression, incorporating structured feature aggregation to improve predictive accuracy and interpretability. We demonstrate the utility of KR TEXAS through both simulations and real microbiome datasets, highlighting its advantages in capturing functional relationships that may be missed by conventional aggregation techniques. 

Keywords

Microbiome

Compositional Data

Kernel Regression

Metric Learning 

Abstracts


Co-Author(s)

Y. Samuel Wang, Cornell University
Martin Wells, Cornell University

First Author

Sithija Manage, Cornell University

Presenting Author

Sithija Manage, Cornell University