SPEED 5: Causal inference, Precision Medicine and Clinical Trials, Part 1

Lance Ballester Chair
Lotus Clinical Research
 
Tuesday, Aug 5: 8:30 AM - 10:20 AM
4095 
Contributed Speed 
Music City Center 
Room: CC-104A 

Presentations

A Doubly-HDPHMM Framework to Study Heterogeneous Population and Individual Rest-Activity Behaviors

The growing availability of wearable device-monitored actigraphy data (human activity movements) has driven the development of advanced statistical models to quantify human rest-activity behaviors. Key features from 24-hour actigraphy data, used as digital biomarkers, are linked to metabolic and neurodegenerative diseases. Hidden Markov models (HMM) have recently been applied to actigraphy data as an effective framework for modeling individual rest-activity patterns. We propose a Doubly Hierarchical Dirichlet Process HMM (Doubly HDPHMM) framework that (1) infers the number of hidden activity states for both individuals and the study population using HDP priors, eliminating the assumption of a fixed number of states that may not suit all subpopulations, and (2) allows flexible incorporation of covariates such as health outcomes in state-specific distributions, enabling simultaneous individual and population-level inference. Using NHANES 2011-2014 actigraphy data, our model distinguishes sleep, sedentary, and physically active behaviors, revealing nuanced within- and between-individual variations and offering insights into complex and heterogeneous rest-activity patterns. 

Keywords

24-Hour Actigraphy Data

Hidden Markov Models

Nonparametric Bayesian

Rest-Activity Behaviors 

Co-Author(s)

Qian Xiao, University of Texas Health Science Center at Houston
Cici Bauer, University of Texas Health Science Center in Houston

First Author

Jiachen Lu, Merck & Co, Inc

Presenting Author

Jiachen Lu, Merck & Co, Inc

An R package for Survival-based Gene Set Enrichment Analysis

Functional enrichment analysis is often used to assess the effect of experimental differences. However, researchers sometimes want to understand the relationship between transcriptomic variation and health outcomes like survival. We propose Survival-based Gene Set Enrichment Analysis (SGSEA) to identify biological functions associated with a disease's survival. Despite the availability of this method, there are no standard tools or software to perform this analysis. We developed an R package and Shiny App called SGSEA and presented a study of kidney renal clear cell carcinoma (KIRC) to demonstrate the approach. Unlike traditional Gene Set Enrichment Analysis (GSEA), which uses log-fold change, SGSEA uses hazard ratios for gene ranking. Our study shows that pathways enriched with genes whose increased transcription is associated with mortality (NES > 0, adjusted p-value < 0.15) have previously been linked to KIRC survival, demonstrating the value of this approach. This method allows rapid identification of disease variant pathways and provides supplementary information to standard GSEA, all within a single R package or via the convenient app. 

Keywords

Gene Set Enrichment Analysis (GSEA)

R package Shiny App

Pathway enrichment analysis

Survival outcomes

Transcriptomic variation

Biological functions 

Co-Author

Jeffrey Thompson, Department of Biostatistics and Data Science at KUMC

First Author

Xiaoxu Deng

Presenting Author

Xiaoxu Deng

Causal indirect effect of an HIV curative treatment: mediators subject to an assay limit and measurement error

Causal mediation analysis decomposes the total effect of a treatment on an outcome into the indirect effect, operating through the mediator, and the direct effect, operating through other pathways. One can estimate the pure or organic indirect effect by combining a hypothesized treatment effect on the mediator with outcome data without treatment. This methodology holds significant promise in selecting prospective treatments based on their indirect effect for further evaluation in randomized clinical trials.
We apply this methodology to assess which of two measures of HIV persistence is a more promising target for future HIV curative treatments. We combine a hypothesized treatment effect on two mediators, and outcome data without treatment, to compare the indirect effect of treatments targeting these mediators. Some HIV persistence measurements fall below the assay limit, leading to left-censored mediators. To address this issue, we assume that the outcome model extends to mediators below the assay limit and use maximum likelihood estimation. To address measurement error in the mediators, we adjust our estimates. Using data from completed ACTG studies, we estimate the pure or organic indirect effect of potential curative HIV treatments on viral suppression through weeks 4 and 8 after HIV medication interruption, mediated by two HIV persistence measures. 

Keywords

causal mediation analysis

causal inference

assay lower limit

measurement error

HIV/AIDS

indirect effects 

Co-Author(s)

Ronald Bosch, Harvard T.H. Chan School of Public Health
Judith Lok, Boston University

First Author

Vindyani Herath, Boston University

Presenting Author

Vindyani Herath, Boston University

Statistical Inference for Binary Outcomes in Two-Sample Summary-Data Mendelian Randomization

Mendelian randomization (MR) is a powerful tool for evaluating causal effects in the presence of unmeasured confounding. With the ever-growing sample sizes in genome-wide association studies, there is a rising trend to perform MR analyses using summary data from genetic associations across diverse phenotypes. Traditional two-sample summary-data MR methods require that the genetic variants employed satisfy the exclusion restriction-a condition frequently violated due to pleiotropy. Although several approaches have been introduced to mitigate this issue, existing methods still fall short when it comes to precisely estimating causal effect sizes for binary outcomes. In this study, we introduce a novel statistical method specifically designed for binary outcome data within the two-sample summary-data MR framework, addressing challenges that commonly arise in practical applications. We demonstrate the efficacy of our method through extensive simulations under various scenarios and provide a comprehensive comparison with current methodologies. 

Keywords

mendelian randomization

binary outcome

summary data

pleiotropy effects

causal inference 

Co-Author

An-Shun Tai, Institute of Statistics and Data Science, National Tsing Hua University

First Author

Chen-Hua Cho, National Tsing Hua University

Presenting Author

Chen-Hua Cho, National Tsing Hua University

Causal mediation analysis of non-mortality outcomes with follow-up truncated by death

In the context of mediation analysis, the presence of death-truncated variables poses a challenge as conventional measures fail to accurately assess the role of a mediator in the effect of a treatment on a primary non-mortality outcome. This study introduces novel estimands – survivor natural direct and indirect effects – to address this issue. Exchangeability assumptions are employed to mitigate confounding effects, and empirical expressions are derived using information from a pretreatment surrogate variable akin to an instrumental variable. Three estimation approaches – model parameterization, generalized method of moments, and data-adaptive G-computation – are developed and applied using data from a National Emphysema Treatment Trial to illustrate the proposed method. 

Keywords

Causal mediation analysis

Data-adaptive G-computation

Death truncation,

Non-mortality outcome

Survivor natural direct and indirect effects. 

First Author

An-Shun Tai, Institute of Statistics and Data Science, National Tsing Hua University

Presenting Author

An-Shun Tai, Institute of Statistics and Data Science, National Tsing Hua University

Comparison of Nonlinear Mendelian Randomization for Causal Inference

Mendelian randomization (MR) uses genetic variants as instrumental variables (IVs) to infer causal effects between an exposure and an outcome based on observational data. While various MR methods have been proposed and applied in recent years, most rely on the assumption of a linear relationship between the exposure and outcome, though this relationship may actually be nonlinear. In this study, we compare several nonlinear IV regression approaches-such as spline-based models, polynomial regression, and deep learning techniques-alongside two stratification-based nonlinear MR methods: doubly-ranked stratification and residual stratification, for estimating localized average causal effects (LACE). These methods are evaluated for their accuracy, efficiency, and robustness in handling complex, nonlinear relationships between the exposure, instruments, and outcome. Our findings provide valuable insights into the performance of these methods, guiding the selection of the most appropriate approach for nonlinear causal inference in MR. 

Keywords

Causal effects

Genetic variants

Genome-Wide Association Studies (GWAS)

Transcriptome-Wide Association Studies (TWAS) 

Co-Author

Wei Pan, University of Minnesota

First Author

Yizeng Li, University of South Carolina

Presenting Author

Yizeng Li, University of South Carolina

Evaluating the Heterogeneity of Treatment Effects Across Subgroups with Existance of Missing Data

In clinical trials, it is important to understand whether the treatment effects are consistent across different subgroups defined based on key baseline factors. However, there is a lack of proper statistical methodology for testing treatment effect heterogeneity in cases where multiple imputation methods are used to handle missing data. Moreover, treatment effect heterogeneity is traditionally tested by adding treatment-by-subgroup interaction to the primary analysis models, but recently published analysis models for improved estimation efficiency can be too complicated to properly add such interaction terms. In this article, we propose a separate model framework to test the heterogeneity of treatment effect across subgroups by constructing a chi-square statistic based on the inferential results from models within each subgroup. Our proposed approach can control the type I error rate well by properly accounting for the correlations introduced during multiple imputation and is applicable to all analysis models. The performance of the proposed method is evaluated using simulations and applies to a real clinical trial. 

Keywords

Subgroup

Separate Model

Rubin’s Rule

Bootstrap 

Co-Author(s)

Ruth Huh, Eli Lilly and Company
Qiwei Wu, Eli Lilly and Company
Yongming Qu, Eli Lilly and Company

First Author

Jianghao Li, Eli Lilly and Company

Presenting Author

Jianghao Li, Eli Lilly and Company

Expanding the Application of Propensity Scores: A Study of Multi-Treatment Matching Package

Nonrandomized studies often suffer from confounding, as the lack of random assignment necessitates statistical techniques to approximate controlled experiments for valid causal inference. If confounding is mishandled studies may falsely attribute the effect of a confounder to exposure thus incorrect conclusions. To improve causal inference, the studies must mimic randomized controlled trials to ensure valid comparisons between treated and untreated groups. Propensity score methods are widely used to mitigate confounding by balancing covariates. While traditional approaches focus on binary treatments, multi-treatment settings introduce complexities in estimation and matching. This research develops a novel algorithm and R package for multi-treatment propensity score matching, integrating logistic regression, machine learning, and advanced matching methods. We evaluate performance across varying data structures and confounding levels using simulated and real-world datasets, measuring balance diagnostics, bias reduction, and treatment effect. These findings advance multi-treatment propensity score methods, offering a more robust framework for causal inference in observational studies 

Keywords

Propensity Scores

Matching

Machine Learning

Treatment Comparison

Multivariate Regression

Balancing 

Co-Author

Bong-Jin Choi, North Dakota State University

First Author

Lizzy Rono, North Dakota State University

Presenting Author

Lizzy Rono, North Dakota State University

Investigating the Impact of Digital Courseware on Learning and Engagement in Introductory Statistics

This study examines the impact of digital courseware on undergraduate students' learning, engagement, and satisfaction in an introductory statistics course. It compares outcomes between students using the courseware and those receiving traditional instruction, investigating whether features such as self-assessments, personalized study plans, and formative practice with feedback enhance learning and engagement more effectively than conventional methods. The study also explores how incorporating real-world examples and authentic datasets influences student satisfaction and the perceived relevance of course content. Differences in performance, engagement, and satisfaction between the two groups will be assessed using final exam scores and course evaluations, while qualitative interviews with students who used the digital courseware will offer deeper insights into their experiences, the applicability of course content to real-world contexts, and overall course satisfaction. Study findings will help identify best practices for integrating technology and data-driven learning into undergraduate statistics education. 

Keywords

digital courseware

introductory statistics

high impact practices

student engagement

formative assessment

feedback 

Co-Author

Vanessa Peters Hinton, Digital Promise

First Author

Zaher Kmail, University of Washington-Tacoma

Presenting Author

Zaher Kmail, University of Washington-Tacoma

Matching-Adjusted Indirect Comparison for Time-to-Event Endpoints

To support Health Teachnology Assessment (HTA) submission, we often need to conduct indirect treatment comparisons (ITC). One of the most popular ITC methods is the Matching-Adjusted Indirect Comparison (MAIC), where the individual patient data (IPD) in one trial and the aggregate data (AgD) in another trial are compared for certain endpoints of interest adjusting for between-trial differences in the covariate distribution that influence outcome. In this abstract, we propose a unified approach using pseudo-value for MAIC with time-to-event (TTE) endpoints with MAIC, including survival rate, restricted mean survival time, and competing risks. 

Keywords

competing risks

HTA

MAIC

restricted mean survival time

time-to-event 

Co-Author

Yixin Fang, AbbVie

First Author

Moming Li, AbbVie

Presenting Author

Moming Li, AbbVie

Methods for Estimating VE Using Routine School Testing Data with Differential Testing Behavior

During the COVID-19 pandemic, many school systems implemented opt-in regular testing for students to track the spread of disease and detect cases early. Beyond the primary use of these testing programs as surveillance, the observational data collected from these programs can be leveraged to measure vaccine effectiveness (VE) among school-aged children. The data from these sources offers complicated challenges to the standard assumptions of vaccine effectiveness methodology, specifically when there is evidence of differential testing behavior between the vaccinated and unvaccinated groups. To combat this issue, we explore approaches to characterize the differences in testing behavior to improve the implementation of standard VE methodology. We apply 3 methods for measuring VE to the observational data: a target trial emulation approach with matching of participants across vaccination groups, a time-varying effect model of vaccination, and a test-negative design. For these methods we compare the losses to sample size due to study design, discuss approaches to adjust for differential testing behavior, and consider additional sources of bias due to unmet assumptions. 

Keywords

Vaccine Effectiveness

Target Trial Emulation

Test-negative Design

Time-varying Effect

COVID-19

Observational Study 

Co-Author(s)

Paige Harton, Emory University
Allison Chamberlain, Emory University, Rollins School of Public Health
Elizabeth Rogawski-McQuade, Emory University
Natalie Dean, Emory University

First Author

Amy Moore

Presenting Author

Amy Moore

Multivariate proteome-wide association study to identify causal proteins for Alzheimer’s disease

Alzheimer's disease (AD) is a complex and progressive neurodegenerative disorder that accounts for the majority of individuals with dementia. Here we aim to identify causal plasma proteins for AD, shedding light on the etiology of AD. We utilized the latest large-scale plasma proteomic data from UK Biobank Pharma Proteomics Project and AD GWAS summary data from the International Genomics of Alzheimer's Project. Via a univariate instrumental variable (IV) regression method, we identified causal proteins through cis-pQTLs and through (both cis- and trans-) pQTLs. To further reduce potential false positives due to high linkage disequilibrium of some pQTLs and high correlations among some proteins, we developed a multivariate IV regression method, called 2-Stage Constrained Maximum Likelihood (MV-2ScML), to distinguish direct and confounding effects of proteins; key features of the method include its robustness to invalid IVs and applicability to GWAS summary data. Our work highlights some differences between using cis-pQTL and trans-pQTL, and critical values of multivariate analysis to detect causal proteins with direct effects, providing insights into plasma protein pathways to AD. 

Keywords

2ScML

2SLS

constrained maximum likelihood

instrumental variable (IV)

pleiotropy 

Co-Author(s)

Haoran Xue, City University of Hong Kong
Zhaotong Lin
Wei Pan, University of Minnesota

First Author

Lei Fang

Presenting Author

Lei Fang

Non-parametric Evaluation of Contextually Optimal Decision-Aids

The growth of AI- and ML-based clinical decision tools provides an array of decision-aid agents that can be implemented into a clinician's decision-making process. However few tools exist for context-specific evaluation of the alignment of these agents with clinicians' workflows, and thus no method to identify an optimal set of aligned agents to adopt. Our work adopts the multinomial logit choice (MNL) model as a framework for evaluating agent-alignment and identifying an optimal agent-set. We assume the observation of selections among a set of agents according to a context-dependent MNL model, characterized by context-dependent preference parameters. We propose a standard regularized maximum likelihood estimation (MLE) procedure, providing a uniform convergence rate over a bounded context space. Additionally, when agent-specific utility parameters or functions are known, an optimal assortment of agents can be identified. This work novelly estimates context-specific alignment of decision-making agents, using results in relevance-weighted likelihood, uniform rates in non-parametric kernel regression, and previous results from the static MNL model. 

Keywords

decision-aids

non-parametric regression

relevance-weighted likelihood

optimal assortment 

Co-Author

Junwei Lu, Harvard T.H. Chan School of Public Health

First Author

Dominic DiSanto, Harvard T.H. Chan School of Public Health

Presenting Author

Dominic DiSanto, Harvard T.H. Chan School of Public Health

Personalized mediation effect model for heterogeneous mobile health data on stress

Pregnancy is a significant period in a woman's life, often accompanied by both mental and physical stressors. Identifying mediators in these associations is crucial for early intervention and improved maternal health outcomes. The growing use of wearable devices enables continuous monitoring of heart rate variability (HRV), sleep patterns, and physical activity.

This study aims to assess the heterogeneity introduced by individual behavioral patterns in wearable device data. Specifically, our research investigates potential mediators between stress and age (≥30), as well as stress and BMI (≥25), during the second and third trimesters of pregnancy. An individualized mediation effect approach incorporating subgrouping is proposed to identify relevant mediators, including daily step count, deep sleep, REM sleep, and a weekly negative emotions score derived from an EMA questionnaire. Additionally, time-varying mediation models are used to capture dynamic changes in the mediation effects. By integrating these methods, this research aims to enhance our understanding of stress-related health disparities during pregnancy and support the development of more personalized interventions. 

Keywords

mediation effect model

heterogeneous data

individualized model

wearable device

subgroup analysis

mobile health 

First Author

Cadence Pinkerton

Presenting Author

Cadence Pinkerton

Quantile growth mixture modeling of weight loss for bariatric surgery patients

The goal of Growth Mixture Modeling (GMM) is to identify underlying latent groups of units which are qualitatively different in their growth trajectories. Among the various assumptions needed for GMM to work, one that is often taken for granted is that residuals of the growth curve portion are assumed to be Normally distributed. Kim et al showed that violations of this assumption can have serious consequences for GMMs. Most notably, one may arrive at the incorrect number of latent classes due to "the relationship between class membership recovery and the proportion of outliers" in the sample of interest. As such, the use of traditional mean-based GMM could lead to misleading conclusions not just about the qualtitative differences between latent classes, but more fundamentally the numbers of latent classes themselves. As such, more robust approaches such as median-based (and by extension quantile-based) GMM are essential advancements to consider. In this paper we extend the median GMM to arbitrary quantiles of the weight loss distribution for a bariatric surgery cohort, by leveraging the location-scale mixture representation of the Asymmetric Laplace Distribution. 

Keywords

Growth mixture modeling

Quantile regression

Growth curve modeling 

Co-Author

Karen Coleman, Kaiser Permanente Southern California

First Author

Ernest Shen, Kaiser Permanente

Presenting Author

Ernest Shen, Kaiser Permanente

Reference based imputation methods integrated with mixed models for addressing intercurrent events

In the estimand framework, reference-based imputation (RBI) methods are recommended under a hypothetical strategy to indicate unfavorable outcomes for patients with intercurrent events (ICEs). Traditionally RBI methods are used as sensitivity analyses to explore deviations from the missing at random (MAR) assumption. This presentation explores the integration of RBIs with mixed models for repeated measures (MMRMs) in primary analyses for continuous longitudinal endpoints.
Different RBI methods (e.g., jump to reference, copy increments in reference) will be applied to specific ICEs (e.g., death, adverse events) with categorical time MMRMs for analyzing changes at a pre-specified time point or with continuous time MMRMs for analyzing the rate of change over time. Simulation studies will evaluate the operating characteristics of these models. Case studies will demonstrate the application of the proposed RBI methods integrated with MMRMs in real-world scenarios, highlighting strengths and limitations, and clarifying interpretation of results. 

Keywords

reference-based imputation

mixed model for repeated measures

intercurrent events 

First Author

Delia Voronca, Regeneron

Presenting Author

Delia Voronca, Regeneron