Monday, Aug 4: 10:30 AM - 12:20 PM
4061
Contributed Papers
Music City Center
Room: CC-103B
Main Sponsor
Section on Statistical Learning and Data Science
Presentations
Mediation analysis is a crucial tool for uncovering the mechanisms through which a treatment affects the outcome, providing deeper causal insights and guiding effective interventions.
Despite advances in analyzing the mediation effect with fixed/low-dimensional mediators and covariates, our understanding of estimation and inference of mediation functional in the presence of (ultra)-high-dimensional mediators and covariates is still limited. In this paper, we present an estimator for mediation functional in a high-dimensional setting that accommodates the interaction between covariates and treatment in generating mediators, as well as interactions between both covariates and treatment and mediators and treatment in generating the response. We demonstrate that our estimator is $\sqrt{n}$-consistent and asymptotically normal, thus enabling reliable inference on direct and indirect treatment effects with asymptotically valid confidence intervals. A key technical contribution of our work is to develop a multi-step debiasing technique, which may also be valuable in other statistical settings with similar structural complexities where accurate estimation depends on debiasing.
We evaluate our proposed methodology through extensive simulation studies and apply it to the TCGA lung cancer dataset to estimate the effect of smoking, mediated by DNA methylation, on the survival time of lung cancer patients.
Keywords
Causal Mediation Analysis
Debiased Estimation
Ultra-High-Dimensional Models
Interaction
Effects
The TransformerPseudo model, an advanced survival Transformer, is specifically designed to analyze high-dimensional, longitudinal multi-omics data while addressing the challenges of covariate-dependent censoring. By transforming survival outcomes into pseudo probabilities, the model circumvents the need for observed survival times and censoring variables, enabling robust estimation of covariate effects on patient survival. Its architecture leverages positional encodings and multi-head attention mechanisms to efficiently capture temporal dependencies and high-dimensional feature interactions, reducing information loss and simplifying the analysis by eliminating the reliance on random effects. Furthermore, the model employs SHAP values for interpretable visualization, offering comprehensive insights into the impact of multi-omics variables. Validated through extensive simulations and real-world applications using the TEDDY disease datasets, the TransformerPseudo model consistently achieves superior predictive accuracy and outperforms traditional methodologies.
Keywords
Survival Analysis
Transformer
Deep Learning
Covariate-dependent censoring
High-dimensional longitudinal multi-omics
Robust estimation
Co-Author
Jiyuan Hu, NYU Grossman School of Medicine
First Author
Yeji Kim, New York University, Division of Biotatistics, Department of Population Health
Presenting Author
Yeji Kim, New York University, Division of Biotatistics, Department of Population Health
Envelope methods perform dimension reduction of predictors or responses in multivariate regression, exploiting the relationship between them to improve estimation efficiency. While most research on envelopes has focused on their estimation properties, certain envelope estimators have been shown to excel at prediction in both low and high dimensions. We propose to further improve prediction through envelope-guided regularization (EgReg), a novel method which uses envelope-derived information to guide shrinkage along the principal components (PCs) of the predictor matrix. We situate EgReg among other PC-based regression methods and envelope methods to motivate its development. We show that EgReg delivers lower prediction risk than a closely related non-shrinkage envelope estimator in fixed dimensions and in an asymptotic regime where the true intrinsic dimension of the predictors and n diverge proportionally. We compare the prediction performance of EgReg with envelope methods and other PC-based prediction methods in simulations and a real data example, observing improved prediction performance over these alternative approaches in general.
Keywords
double descent
predictor envelopes
principal components
shrinkage estimator
REstricted Maximum Likelihood (REML) estimators are commonly used to produce unbiased estimators for the variance components in linear mixed models. Nowadays, the dimension of the design matrix with respect to the random effects may be high, especially in genetic association studies. Originating from this, I will first introduce the high-dimensional kernel linear mixed models. The REML equations will be derived followed by a discussion on the consistency of REML estimators for some commonly used kernel matrices. The validity of the theories is demonstrated via some simulation studies.
Keywords
Kernel Methods
Inner Product Random Matrices
Restricted Maximum Likelihood Estimator
Consistency
Regression models with high-dimensional response are increasingly ubiquitous across various domains, including computer experiments with high-dimensional output. Current methodology involves compressing the response using Unsupervised Dimension Reduction (UnsuperDR) techniques such as Singular Value Decomposition (SVD), and training regression models to predict the compressed values. We implement a novel Supervised Dimension Reduction (SuperDR) approach to infer an optimal linear compression within a comprehensive statistical model to simultaneously compress and predict high-dimensional response variables. Leveraging recent advances in SuperDR for linear models and regression modeling for multivariate output, our approach alternates between estimating a compressed regression model and an expansion matrix, theoretically converging to an optimal solution. Our framework is agnostic to the chosen regression model, as demonstrated by our implementation with Polynomial Chaos Expansion and Random Forests regression. We compare the effectiveness of SuperDR against the state-of-the-art UnsuperDR framework.
Keywords
Supervised Dimension Reduction
High-Dimensional Response
Nonlinear Regression
Identifying differences between biological networks, particularly in disease settings, can point to targets for diagnosis and treatment. Differential network analysis has been used to characterize gene expression network differences in cancer vs normal tissue, changes in brain networks over time in Alzheimer's, and gut microbial networks under various conditions. We identified 40+ methods for estimating and testing network differences, which are implemented in over 25 R and Python packages. There is not currently a practical review and comparative study of this wide variety of software. We will present a comprehensive application-focused review and comparative evaluation, with the aim of providing a resource for applied researchers to select and put these methods into practice. We will give accessible explanations of the overarching methods and provide reproducible code examples for select methods using publicly available biological datasets. We focus on comparing popular frequentist estimation (Joint Graphical Lasso, Danaher 2014 and iDingo, Class 2017) with a Bayesian counterpart (Spike and Slab Joint Graphical Lasso, Li 2019).
Keywords
graphical models
biological pathway estimation
differential network analysis
Bayesian network estimation
software application
method overview
Causal mediation analysis provides a framework to understand the causal pathways through which an independent variable affects an outcome via intermediary mediators. However, in high-dimensional settings — where the number of covariates and mediators far exceeds number of observations, traditional methods often fail to capture the complexity of the data. To address this, we propose a novel approach that leverages latent factor models to estimate the mediation functional via modified diversified projection method where one is not required to know the true dimension of the latent factors. We present a √n-consistent and asymptotically normal estimator that allows the interaction between covariates and treatment in generating mediators, as well as the interactions between covariates and treatment and mediators and treatment in generating the response. To demonstrate the practical relevance of our approach, we conduct an analysis on ADNI (Alzheimer's Disease Neuroimaging Initiative) study, investigating how DNA methylation mediates the progress of Alzheimer's Disease (AD). Moveover, our findings supported by extensive simulations and an investigation of geriatric depression scale effect on Alzheimer's Disease Assessment Scale – Cognitive Subscale (ADAS-Cog), mediated via DNA methylation.
Keywords
Latent Factors
Mediation Analysis
High-Dimensional Models
Factor Model
Diversified Projection Method