Semiparametric Modeling Advanced Research

Wei-Wen Hsu Chair
University of Cincinnati
 
Thursday, Aug 7: 10:30 AM - 12:20 PM
4227 
Contributed Papers 
Music City Center 
Room: CC-103B 

Main Sponsor

Biometrics Section

Presentations

Average Treatment Effect with Continuous Instrumental Variables

The instrumental variable (IV) approach is a widely used method for estimating the average treatment effect (ATE) in the presence of unmeasured confounders. Existing methods for continuous IVs often rely on structural equation modeling, which imposes strong parametric assumptions and can yield biased estimates, particularly for binary outcomes. In this work, we propose a novel nonparametric identification strategy for the ATE using a continuous IV under the potential outcome framework, leveraging the conditional weighted average derivative effect. For estimation, we assume a partial linear model for the IV-treatment relationship. Under this model, we develop a bounded, locally efficient, and multiply robust estimator that extends the properties of semiparametric efficient estimators for binary IVs to continuous IVs. Notably, our estimator remains consistent even if the partial linear model is misspecified. Simulation results demonstrate that our proposed multiply robust estimator is unbiased and robust to model misspecification. Finally, we apply the proposed estimators to estimate the causal effect of obesity on the two-year mortality rate of non-small cell lung cancer patients. 

Keywords

Average Treatment Effect

Continuous Instrumental Variable

Semiparametric Efficiency 

Co-Author(s)

Dingke Tang
Wei Xu, University of Toronto
Linbo Wang, University of Toronto

First Author

Mei Dong

Presenting Author

Mei Dong

Bayesian Model of Gap Times for Multitype Recurrent and a Terminal Event: A Joint Dynamic Framework

In biomedical research, recurrent events like coronary heart disease, stroke, and heart failure often result in terminal outcomes such as death. Understanding these relationships is essential for developing effective interventions. This study proposes a Bayesian semiparametric joint dynamic model that captures event dependencies, cumulative effects of past recurrent events on themselves and terminal events, covariates, and frailty. Gamma process priors are used for the baseline cumulative hazard function (CHF) and parametric priors for covariates and frailty. In addition to incorporating gap time distributions for more accurate risk assessment, this model provides an analytical closed-form estimator of CHF and parameter estimates through MCMC. Breslow-Aalen-type estimators of baseline CHFs are special cases of our estimators when precision parameters are set to zero. The model's accuracy and superiority are validated through simulations over the frequentist model, while its application to the Antihypertensive and Lipid-Lowering Treatment to Prevent Heart Attack Trial (ALLHAT-LLT) study, offers new insights into preventing cardiovascular disease and reducing its mortality risks. 

Keywords

Gap time

Multitype recurrent events

Terminal event

Bayesian semiparametric joint dynamic model

Gamma process prior

ALLHAT-LLT Study 

Co-Author

AKM Fazlur Rahman, University of Alabama at Birmingham

First Author

Mithun Kumar Acharjee

Presenting Author

Mithun Kumar Acharjee

Efficient Nonparametric Inference for Mediation Analysis with Nonignorable Missing Confounders

Mediation analysis is widely used for exploring treatment mechanisms, but it faces challenges due to the presence of nonignorable missing confounders, particularly in questionnaire surveys of observational studies. Efficient inference about mediation effects and efficiency loss due to nonignorable missingness have rarely been studied in literature because of the difficulties arising from the ill-posed inverse problem. To address this issue, we first show that the mediation effect of interest can be identified as a weighted average of an iterated conditional expectation with an available shadow variable. We then propose a Sieve-based Iterative Outward (SIO) approach for estimation. We establish the large sample theory, particularly the asymptotic normality, of the proposed estimator despite the challenges of the ill-posed problem. We show that our estimator is locally efficient and attains the semiparametric efficiency bound under certain conditions. We accurately depict the efficiency loss attributable to missingness and identify scenarios in which efficiency loss is absent. We also propose a stable and easy-to-implement approach to estimate asymptotic variance and construct confidence intervals for mediation effects. The finite-sample performance of our proposed approach is evaluated through simulation studies, and we apply it to the CFPS data to demonstrate its practical applicability. 

Keywords

Direct and indirect effects

Ill-posed inverse problem

Missing not at random

Semiparametric efficiency bound

Sieve approximation 

First Author

Jiawei Shan, University of Wisconsin-Madison

Presenting Author

Jiawei Shan, University of Wisconsin-Madison

Federated Proportional Likelihood Ratio Model for Heterogeneous Multisite Studies

Federated learning enables collaborative data analysis while preserving privacy, making it particularly valuable in multisite healthcare studies where data sharing is restricted. The proportional likelihood ratio model (PLRM) is a flexible semi-parametric framework used in these settings, often assuming a common regression coefficient β across sites. However, real-world differences in population characteristics and study protocols can lead to slight variations in β. To address this, we develop a federated learning method that allows for minor variations in β while still leveraging global information to improve estimation efficiency at a primary site. Unlike existing methods that focus on site-specific nuisance parameters, our approach explicitly models and accounts for β heterogeneity, enhancing robustness in distributed inference. 

Keywords

Federated Learning

Semi-parametric Methods

Proportional Likelihood Ratio Model

Distributed Inference

Heterogeneous Data Analysis

Multisite Studies 

First Author

Tinghui Xu

Presenting Author

Tinghui Xu

Interpretable, simultaneous relevant components from multimodal data with structured iSSANOVA-PCA

Isolating relevant variation in data and decomposing it into interpretable processes is critical for hypothesis driven research. In multimodal data analysis, classical simultaneous components analysis can prioritize irrelevant variance components (eg., in integrative genomics data analysis of thousands of features). Supervised PCA-type methods, developed for prediction tasks, limits us to situations with a measured response, assumes that relevance is captured by the response alone, and do not always lead to stable interpretations. We propose a semiparametric approach, where the simultaneous components are modeled as functions in a reproducing Kernel Hilbert Space, conducive for statistical modeling, yielding smoothing spline ANOVA decompositions. The result is a sequence of components (processes), ranked in their order of explaining total relevant data variation; the processes themselves are largely explained in terms of simple functions of known predictors, enhancing interpretability. The quality of relevant inferences obtained for a variety of research questions (e.g., cancer progression, host-pathogen response) demonstrate the significance of the proposal over the competition. 

Keywords

pca

multi-modal

integrative

data analysis

interpretation

ssanova 

Co-Author

Rafael Irizarry, Dana-Farber Cancer Institute

First Author

Senthil Kumar Muthiah

Presenting Author

Senthil Kumar Muthiah

Semi-Explainable Deep Cox Model for Analyzing Survival Time in End Stage Kidney Diseases

Employing deep cox model (DCM) allows for capturing the non-linear behavior in the survival time of a person with end-stage kidney disease (ESKD). However, this approach does not provide any explanation or determine the importance of different features. In this work, we combine DCM with Stochastic Variable Selection (SVS) to address this gap in the modeling. Furthermore, we study the effect of the sample size on the accuracy of the DCM compared to the traditional Cox Model using the Harrell C index. Our results indicates that gains only occur when the sample size is very large. The results of this analysis were consistent with variables selected using likelihood-based methods.

The research reported in this abstract was supported by South Dakota State University, AIM-AHEAD Coordinating Center, award number OTA-21-017, and was, in part, funded by the National Institutes of Health Agreement No. 1OT2OD032581. The work is solely the responsibility of the authors and does not necessarily represent the official view of AIM-AHEAD or the National Institutes of Health. 

Keywords

Deep Cox

Survival Analysis

End Stage Kidney Diseases 

Co-Author

Semhar Michael, South Dakota State University

First Author

Hossein Moradi Rekabdarkolaee, South Dakota State University

Presenting Author

Hossein Moradi Rekabdarkolaee, South Dakota State University

SPYCE: Semi-Parametric Y-dependent right-Censored Covariate Estimator

Slow progression of Huntington's disease often hinders estimating how symptoms change over time due to right-censoring, where patients may not visit after a certain period. The censoring mechanism may depend on disease severity, with severe patients more likely to exit studies early. In this outcome-dependent censoring scenario, existing estimators fail to achieve consistency or require correct estimation of the models. We propose SPYCE, a doubly robust semiparametric estimator, which is consistent even when one of the models is misspecified. Utilizing semiparametric theory, we show that SPYCE is consistent and asymptotically normal for parametric nuisance models, having the smallest variance when both nuisance parameters are consistently estimated. Through kernel estimation and inverse probability weighting, we introduce flexibility in the nuisance models while retaining the same results. Simulation studies confirm the efficiency and double robustness of SPYCE compared to existing methods. Finally, analyzing the PREDICT-HD dataset, we discover that SPYCE gives different results about how symptoms change over time compared to conventional methods that are prone to error. 

Keywords

semiparametric modeling

right-censoring

double robustness

inverse probability weighting

kernel estimation

Huntington's disease 

Co-Author(s)

Yanyuan Ma, Penn State University
Tanya Garcia

First Author

Kihyun Han

Presenting Author

Kihyun Han