Tuesday, Aug 5: 8:30 AM - 10:20 AM
4095
Contributed Speed
Music City Center
Room: CC-104A
Presentations
The growing availability of wearable device-monitored actigraphy data (human activity movements) has driven the development of advanced statistical models to quantify human rest-activity behaviors. Key features from 24-hour actigraphy data, used as digital biomarkers, are linked to metabolic and neurodegenerative diseases. Hidden Markov models (HMM) have recently been applied to actigraphy data as an effective framework for modeling individual rest-activity patterns. We propose a Doubly Hierarchical Dirichlet Process HMM (Doubly HDPHMM) framework that (1) infers the number of hidden activity states for both individuals and the study population using HDP priors, eliminating the assumption of a fixed number of states that may not suit all subpopulations, and (2) allows flexible incorporation of covariates such as health outcomes in state-specific distributions, enabling simultaneous individual and population-level inference. Using NHANES 2011-2014 actigraphy data, our model distinguishes sleep, sedentary, and physically active behaviors, revealing nuanced within- and between-individual variations and offering insights into complex and heterogeneous rest-activity patterns.
Keywords
24-Hour Actigraphy Data
Hidden Markov Models
Nonparametric Bayesian
Rest-Activity Behaviors
Co-Author(s)
Qian Xiao, University of Texas Health Science Center at Houston
Cici Bauer, University of Texas Health Science Center in Houston
First Author
Jiachen Lu, Merck & Co, Inc
Presenting Author
Jiachen Lu, Merck & Co, Inc
Functional enrichment analysis is often used to assess the effect of experimental differences. However, researchers sometimes want to understand the relationship between transcriptomic variation and health outcomes like survival. We propose Survival-based Gene Set Enrichment Analysis (SGSEA) to identify biological functions associated with a disease's survival. Despite the availability of this method, there are no standard tools or software to perform this analysis. We developed an R package and Shiny App called SGSEA and presented a study of kidney renal clear cell carcinoma (KIRC) to demonstrate the approach. Unlike traditional Gene Set Enrichment Analysis (GSEA), which uses log-fold change, SGSEA uses hazard ratios for gene ranking. Our study shows that pathways enriched with genes whose increased transcription is associated with mortality (NES > 0, adjusted p-value < 0.15) have previously been linked to KIRC survival, demonstrating the value of this approach. This method allows rapid identification of disease variant pathways and provides supplementary information to standard GSEA, all within a single R package or via the convenient app.
Keywords
Gene Set Enrichment Analysis (GSEA)
R package Shiny App
Pathway enrichment analysis
Survival outcomes
Transcriptomic variation
Biological functions
Causal mediation analysis decomposes the total effect of a treatment on an outcome into the indirect effect, operating through the mediator, and the direct effect, operating through other pathways. One can estimate the pure or organic indirect effect by combining a hypothesized treatment effect on the mediator with outcome data without treatment. This methodology holds significant promise in selecting prospective treatments based on their indirect effect for further evaluation in randomized clinical trials.
We apply this methodology to assess which of two measures of HIV persistence is a more promising target for future HIV curative treatments. We combine a hypothesized treatment effect on two mediators, and outcome data without treatment, to compare the indirect effect of treatments targeting these mediators. Some HIV persistence measurements fall below the assay limit, leading to left-censored mediators. To address this issue, we assume that the outcome model extends to mediators below the assay limit and use maximum likelihood estimation. To address measurement error in the mediators, we adjust our estimates. Using data from completed ACTG studies, we estimate the pure or organic indirect effect of potential curative HIV treatments on viral suppression through weeks 4 and 8 after HIV medication interruption, mediated by two HIV persistence measures.
Keywords
causal mediation analysis
causal inference
assay lower limit
measurement error
HIV/AIDS
indirect effects
Mendelian randomization (MR) is a powerful tool for evaluating causal effects in the presence of unmeasured confounding. With the ever-growing sample sizes in genome-wide association studies, there is a rising trend to perform MR analyses using summary data from genetic associations across diverse phenotypes. Traditional two-sample summary-data MR methods require that the genetic variants employed satisfy the exclusion restriction-a condition frequently violated due to pleiotropy. Although several approaches have been introduced to mitigate this issue, existing methods still fall short when it comes to precisely estimating causal effect sizes for binary outcomes. In this study, we introduce a novel statistical method specifically designed for binary outcome data within the two-sample summary-data MR framework, addressing challenges that commonly arise in practical applications. We demonstrate the efficacy of our method through extensive simulations under various scenarios and provide a comprehensive comparison with current methodologies.
Keywords
mendelian randomization
binary outcome
summary data
pleiotropy effects
causal inference
Co-Author
An-Shun Tai, Institute of Statistics and Data Science, National Tsing Hua University
First Author
Chen-Hua Cho, National Tsing Hua University
Presenting Author
Chen-Hua Cho, National Tsing Hua University
In the context of mediation analysis, the presence of death-truncated variables poses a challenge as conventional measures fail to accurately assess the role of a mediator in the effect of a treatment on a primary non-mortality outcome. This study introduces novel estimands – survivor natural direct and indirect effects – to address this issue. Exchangeability assumptions are employed to mitigate confounding effects, and empirical expressions are derived using information from a pretreatment surrogate variable akin to an instrumental variable. Three estimation approaches – model parameterization, generalized method of moments, and data-adaptive G-computation – are developed and applied using data from a National Emphysema Treatment Trial to illustrate the proposed method.
Keywords
Causal mediation analysis
Data-adaptive G-computation
Death truncation,
Non-mortality outcome
Survivor natural direct and indirect effects.
First Author
An-Shun Tai, Institute of Statistics and Data Science, National Tsing Hua University
Presenting Author
An-Shun Tai, Institute of Statistics and Data Science, National Tsing Hua University
Mendelian randomization (MR) uses genetic variants as instrumental variables (IVs) to infer causal effects between an exposure and an outcome based on observational data. While various MR methods have been proposed and applied in recent years, most rely on the assumption of a linear relationship between the exposure and outcome, though this relationship may actually be nonlinear. In this study, we compare several nonlinear IV regression approaches-such as spline-based models, polynomial regression, and deep learning techniques-alongside two stratification-based nonlinear MR methods: doubly-ranked stratification and residual stratification, for estimating localized average causal effects (LACE). These methods are evaluated for their accuracy, efficiency, and robustness in handling complex, nonlinear relationships between the exposure, instruments, and outcome. Our findings provide valuable insights into the performance of these methods, guiding the selection of the most appropriate approach for nonlinear causal inference in MR.
Keywords
Causal effects
Genetic variants
Genome-Wide Association Studies (GWAS)
Transcriptome-Wide Association Studies (TWAS)
Co-Author
Wei Pan, University of Minnesota
First Author
Yizeng Li, University of South Carolina
Presenting Author
Yizeng Li, University of South Carolina
In clinical trials, it is important to understand whether the treatment effects are consistent across different subgroups defined based on key baseline factors. However, there is a lack of proper statistical methodology for testing treatment effect heterogeneity in cases where multiple imputation methods are used to handle missing data. Moreover, treatment effect heterogeneity is traditionally tested by adding treatment-by-subgroup interaction to the primary analysis models, but recently published analysis models for improved estimation efficiency can be too complicated to properly add such interaction terms. In this article, we propose a separate model framework to test the heterogeneity of treatment effect across subgroups by constructing a chi-square statistic based on the inferential results from models within each subgroup. Our proposed approach can control the type I error rate well by properly accounting for the correlations introduced during multiple imputation and is applicable to all analysis models. The performance of the proposed method is evaluated using simulations and applies to a real clinical trial.
Keywords
Subgroup
Separate Model
Rubin’s Rule
Bootstrap
Nonrandomized studies often suffer from confounding, as the lack of random assignment necessitates statistical techniques to approximate controlled experiments for valid causal inference. If confounding is mishandled studies may falsely attribute the effect of a confounder to exposure thus incorrect conclusions. To improve causal inference, the studies must mimic randomized controlled trials to ensure valid comparisons between treated and untreated groups. Propensity score methods are widely used to mitigate confounding by balancing covariates. While traditional approaches focus on binary treatments, multi-treatment settings introduce complexities in estimation and matching. This research develops a novel algorithm and R package for multi-treatment propensity score matching, integrating logistic regression, machine learning, and advanced matching methods. We evaluate performance across varying data structures and confounding levels using simulated and real-world datasets, measuring balance diagnostics, bias reduction, and treatment effect. These findings advance multi-treatment propensity score methods, offering a more robust framework for causal inference in observational studies
Keywords
Propensity Scores
Matching
Machine Learning
Treatment Comparison
Multivariate Regression
Balancing
Co-Author
Bong-Jin Choi, North Dakota State University
First Author
Lizzy Rono, North Dakota State University
Presenting Author
Lizzy Rono, North Dakota State University
This study examines the impact of digital courseware on undergraduate students' learning, engagement, and satisfaction in an introductory statistics course. It compares outcomes between students using the courseware and those receiving traditional instruction, investigating whether features such as self-assessments, personalized study plans, and formative practice with feedback enhance learning and engagement more effectively than conventional methods. The study also explores how incorporating real-world examples and authentic datasets influences student satisfaction and the perceived relevance of course content. Differences in performance, engagement, and satisfaction between the two groups will be assessed using final exam scores and course evaluations, while qualitative interviews with students who used the digital courseware will offer deeper insights into their experiences, the applicability of course content to real-world contexts, and overall course satisfaction. Study findings will help identify best practices for integrating technology and data-driven learning into undergraduate statistics education.
Keywords
digital courseware
introductory statistics
high impact practices
student engagement
formative assessment
feedback
To support Health Teachnology Assessment (HTA) submission, we often need to conduct indirect treatment comparisons (ITC). One of the most popular ITC methods is the Matching-Adjusted Indirect Comparison (MAIC), where the individual patient data (IPD) in one trial and the aggregate data (AgD) in another trial are compared for certain endpoints of interest adjusting for between-trial differences in the covariate distribution that influence outcome. In this abstract, we propose a unified approach using pseudo-value for MAIC with time-to-event (TTE) endpoints with MAIC, including survival rate, restricted mean survival time, and competing risks.
Keywords
competing risks
HTA
MAIC
restricted mean survival time
time-to-event
During the COVID-19 pandemic, many school systems implemented opt-in regular testing for students to track the spread of disease and detect cases early. Beyond the primary use of these testing programs as surveillance, the observational data collected from these programs can be leveraged to measure vaccine effectiveness (VE) among school-aged children. The data from these sources offers complicated challenges to the standard assumptions of vaccine effectiveness methodology, specifically when there is evidence of differential testing behavior between the vaccinated and unvaccinated groups. To combat this issue, we explore approaches to characterize the differences in testing behavior to improve the implementation of standard VE methodology. We apply 3 methods for measuring VE to the observational data: a target trial emulation approach with matching of participants across vaccination groups, a time-varying effect model of vaccination, and a test-negative design. For these methods we compare the losses to sample size due to study design, discuss approaches to adjust for differential testing behavior, and consider additional sources of bias due to unmet assumptions.
Keywords
Vaccine Effectiveness
Target Trial Emulation
Test-negative Design
Time-varying Effect
COVID-19
Observational Study
Alzheimer's disease (AD) is a complex and progressive neurodegenerative disorder that accounts for the majority of individuals with dementia. Here we aim to identify causal plasma proteins for AD, shedding light on the etiology of AD. We utilized the latest large-scale plasma proteomic data from UK Biobank Pharma Proteomics Project and AD GWAS summary data from the International Genomics of Alzheimer's Project. Via a univariate instrumental variable (IV) regression method, we identified causal proteins through cis-pQTLs and through (both cis- and trans-) pQTLs. To further reduce potential false positives due to high linkage disequilibrium of some pQTLs and high correlations among some proteins, we developed a multivariate IV regression method, called 2-Stage Constrained Maximum Likelihood (MV-2ScML), to distinguish direct and confounding effects of proteins; key features of the method include its robustness to invalid IVs and applicability to GWAS summary data. Our work highlights some differences between using cis-pQTL and trans-pQTL, and critical values of multivariate analysis to detect causal proteins with direct effects, providing insights into plasma protein pathways to AD.
Keywords
2ScML
2SLS
constrained maximum likelihood
instrumental variable (IV)
pleiotropy
The growth of AI- and ML-based clinical decision tools provides an array of decision-aid agents that can be implemented into a clinician's decision-making process. However few tools exist for context-specific evaluation of the alignment of these agents with clinicians' workflows, and thus no method to identify an optimal set of aligned agents to adopt. Our work adopts the multinomial logit choice (MNL) model as a framework for evaluating agent-alignment and identifying an optimal agent-set. We assume the observation of selections among a set of agents according to a context-dependent MNL model, characterized by context-dependent preference parameters. We propose a standard regularized maximum likelihood estimation (MLE) procedure, providing a uniform convergence rate over a bounded context space. Additionally, when agent-specific utility parameters or functions are known, an optimal assortment of agents can be identified. This work novelly estimates context-specific alignment of decision-making agents, using results in relevance-weighted likelihood, uniform rates in non-parametric kernel regression, and previous results from the static MNL model.
Keywords
decision-aids
non-parametric regression
relevance-weighted likelihood
optimal assortment
Co-Author
Junwei Lu, Harvard T.H. Chan School of Public Health
First Author
Dominic DiSanto, Harvard T.H. Chan School of Public Health
Presenting Author
Dominic DiSanto, Harvard T.H. Chan School of Public Health
Pregnancy is a significant period in a woman's life, often accompanied by both mental and physical stressors. Identifying mediators in these associations is crucial for early intervention and improved maternal health outcomes. The growing use of wearable devices enables continuous monitoring of heart rate variability (HRV), sleep patterns, and physical activity.
This study aims to assess the heterogeneity introduced by individual behavioral patterns in wearable device data. Specifically, our research investigates potential mediators between stress and age (≥30), as well as stress and BMI (≥25), during the second and third trimesters of pregnancy. An individualized mediation effect approach incorporating subgrouping is proposed to identify relevant mediators, including daily step count, deep sleep, REM sleep, and a weekly negative emotions score derived from an EMA questionnaire. Additionally, time-varying mediation models are used to capture dynamic changes in the mediation effects. By integrating these methods, this research aims to enhance our understanding of stress-related health disparities during pregnancy and support the development of more personalized interventions.
Keywords
mediation effect model
heterogeneous data
individualized model
wearable device
subgroup analysis
mobile health
The goal of Growth Mixture Modeling (GMM) is to identify underlying latent groups of units which are qualitatively different in their growth trajectories. Among the various assumptions needed for GMM to work, one that is often taken for granted is that residuals of the growth curve portion are assumed to be Normally distributed. Kim et al showed that violations of this assumption can have serious consequences for GMMs. Most notably, one may arrive at the incorrect number of latent classes due to "the relationship between class membership recovery and the proportion of outliers" in the sample of interest. As such, the use of traditional mean-based GMM could lead to misleading conclusions not just about the qualtitative differences between latent classes, but more fundamentally the numbers of latent classes themselves. As such, more robust approaches such as median-based (and by extension quantile-based) GMM are essential advancements to consider. In this paper we extend the median GMM to arbitrary quantiles of the weight loss distribution for a bariatric surgery cohort, by leveraging the location-scale mixture representation of the Asymmetric Laplace Distribution.
Keywords
Growth mixture modeling
Quantile regression
Growth curve modeling
In the estimand framework, reference-based imputation (RBI) methods are recommended under a hypothetical strategy to indicate unfavorable outcomes for patients with intercurrent events (ICEs). Traditionally RBI methods are used as sensitivity analyses to explore deviations from the missing at random (MAR) assumption. This presentation explores the integration of RBIs with mixed models for repeated measures (MMRMs) in primary analyses for continuous longitudinal endpoints.
Different RBI methods (e.g., jump to reference, copy increments in reference) will be applied to specific ICEs (e.g., death, adverse events) with categorical time MMRMs for analyzing changes at a pre-specified time point or with continuous time MMRMs for analyzing the rate of change over time. Simulation studies will evaluate the operating characteristics of these models. Case studies will demonstrate the application of the proposed RBI methods integrated with MMRMs in real-world scenarios, highlighting strengths and limitations, and clarifying interpretation of results.
Keywords
reference-based imputation
mixed model for repeated measures
intercurrent events