Thursday, Aug 7: 10:30 AM - 12:20 PM
4227
Contributed Papers
Music City Center
Room: CC-103B
Main Sponsor
Biometrics Section
Presentations
The instrumental variable (IV) approach is a widely used method for estimating the average treatment effect (ATE) in the presence of unmeasured confounders. Existing methods for continuous IVs often rely on structural equation modeling, which imposes strong parametric assumptions and can yield biased estimates, particularly for binary outcomes. In this work, we propose a novel nonparametric identification strategy for the ATE using a continuous IV under the potential outcome framework, leveraging the conditional weighted average derivative effect. For estimation, we assume a partial linear model for the IV-treatment relationship. Under this model, we develop a bounded, locally efficient, and multiply robust estimator that extends the properties of semiparametric efficient estimators for binary IVs to continuous IVs. Notably, our estimator remains consistent even if the partial linear model is misspecified. Simulation results demonstrate that our proposed multiply robust estimator is unbiased and robust to model misspecification. Finally, we apply the proposed estimators to estimate the causal effect of obesity on the two-year mortality rate of non-small cell lung cancer patients.
Keywords
Average Treatment Effect
Continuous Instrumental Variable
Semiparametric Efficiency
In biomedical research, recurrent events like coronary heart disease, stroke, and heart failure often result in terminal outcomes such as death. Understanding these relationships is essential for developing effective interventions. This study proposes a Bayesian semiparametric joint dynamic model that captures event dependencies, cumulative effects of past recurrent events on themselves and terminal events, covariates, and frailty. Gamma process priors are used for the baseline cumulative hazard function (CHF) and parametric priors for covariates and frailty. In addition to incorporating gap time distributions for more accurate risk assessment, this model provides an analytical closed-form estimator of CHF and parameter estimates through MCMC. Breslow-Aalen-type estimators of baseline CHFs are special cases of our estimators when precision parameters are set to zero. The model's accuracy and superiority are validated through simulations over the frequentist model, while its application to the Antihypertensive and Lipid-Lowering Treatment to Prevent Heart Attack Trial (ALLHAT-LLT) study, offers new insights into preventing cardiovascular disease and reducing its mortality risks.
Keywords
Gap time
Multitype recurrent events
Terminal event
Bayesian semiparametric joint dynamic model
Gamma process prior
ALLHAT-LLT Study
Mediation analysis is widely used for exploring treatment mechanisms, but it faces challenges due to the presence of nonignorable missing confounders, particularly in questionnaire surveys of observational studies. Efficient inference about mediation effects and efficiency loss due to nonignorable missingness have rarely been studied in literature because of the difficulties arising from the ill-posed inverse problem. To address this issue, we first show that the mediation effect of interest can be identified as a weighted average of an iterated conditional expectation with an available shadow variable. We then propose a Sieve-based Iterative Outward (SIO) approach for estimation. We establish the large sample theory, particularly the asymptotic normality, of the proposed estimator despite the challenges of the ill-posed problem. We show that our estimator is locally efficient and attains the semiparametric efficiency bound under certain conditions. We accurately depict the efficiency loss attributable to missingness and identify scenarios in which efficiency loss is absent. We also propose a stable and easy-to-implement approach to estimate asymptotic variance and construct confidence intervals for mediation effects. The finite-sample performance of our proposed approach is evaluated through simulation studies, and we apply it to the CFPS data to demonstrate its practical applicability.
Keywords
Direct and indirect effects
Ill-posed inverse problem
Missing not at random
Semiparametric efficiency bound
Sieve approximation
First Author
Jiawei Shan, University of Wisconsin-Madison
Presenting Author
Jiawei Shan, University of Wisconsin-Madison
Federated learning enables collaborative data analysis while preserving privacy, making it particularly valuable in multisite healthcare studies where data sharing is restricted. The proportional likelihood ratio model (PLRM) is a flexible semi-parametric framework used in these settings, often assuming a common regression coefficient β across sites. However, real-world differences in population characteristics and study protocols can lead to slight variations in β. To address this, we develop a federated learning method that allows for minor variations in β while still leveraging global information to improve estimation efficiency at a primary site. Unlike existing methods that focus on site-specific nuisance parameters, our approach explicitly models and accounts for β heterogeneity, enhancing robustness in distributed inference.
Keywords
Federated Learning
Semi-parametric Methods
Proportional Likelihood Ratio Model
Distributed Inference
Heterogeneous Data Analysis
Multisite Studies
Isolating relevant variation in data and decomposing it into interpretable processes is critical for hypothesis driven research. In multimodal data analysis, classical simultaneous components analysis can prioritize irrelevant variance components (eg., in integrative genomics data analysis of thousands of features). Supervised PCA-type methods, developed for prediction tasks, limits us to situations with a measured response, assumes that relevance is captured by the response alone, and do not always lead to stable interpretations. We propose a semiparametric approach, where the simultaneous components are modeled as functions in a reproducing Kernel Hilbert Space, conducive for statistical modeling, yielding smoothing spline ANOVA decompositions. The result is a sequence of components (processes), ranked in their order of explaining total relevant data variation; the processes themselves are largely explained in terms of simple functions of known predictors, enhancing interpretability. The quality of relevant inferences obtained for a variety of research questions (e.g., cancer progression, host-pathogen response) demonstrate the significance of the proposal over the competition.
Keywords
pca
multi-modal
integrative
data analysis
interpretation
ssanova
Employing deep cox model (DCM) allows for capturing the non-linear behavior in the survival time of a person with end-stage kidney disease (ESKD). However, this approach does not provide any explanation or determine the importance of different features. In this work, we combine DCM with Stochastic Variable Selection (SVS) to address this gap in the modeling. Furthermore, we study the effect of the sample size on the accuracy of the DCM compared to the traditional Cox Model using the Harrell C index. Our results indicates that gains only occur when the sample size is very large. The results of this analysis were consistent with variables selected using likelihood-based methods.
The research reported in this abstract was supported by South Dakota State University, AIM-AHEAD Coordinating Center, award number OTA-21-017, and was, in part, funded by the National Institutes of Health Agreement No. 1OT2OD032581. The work is solely the responsibility of the authors and does not necessarily represent the official view of AIM-AHEAD or the National Institutes of Health.
Keywords
Deep Cox
Survival Analysis
End Stage Kidney Diseases
Slow progression of Huntington's disease often hinders estimating how symptoms change over time due to right-censoring, where patients may not visit after a certain period. The censoring mechanism may depend on disease severity, with severe patients more likely to exit studies early. In this outcome-dependent censoring scenario, existing estimators fail to achieve consistency or require correct estimation of the models. We propose SPYCE, a doubly robust semiparametric estimator, which is consistent even when one of the models is misspecified. Utilizing semiparametric theory, we show that SPYCE is consistent and asymptotically normal for parametric nuisance models, having the smallest variance when both nuisance parameters are consistently estimated. Through kernel estimation and inverse probability weighting, we introduce flexibility in the nuisance models while retaining the same results. Simulation studies confirm the efficiency and double robustness of SPYCE compared to existing methods. Finally, analyzing the PREDICT-HD dataset, we discover that SPYCE gives different results about how symptoms change over time compared to conventional methods that are prone to error.
Keywords
semiparametric modeling
right-censoring
double robustness
inverse probability weighting
kernel estimation
Huntington's disease