Wednesday, Aug 6: 8:30 AM - 10:20 AM
0568
Topic-Contributed Paper Session
Music City Center
Room: CC-201B
This session brings together leading researchers whose work at the intersection of statistical and machine learning methods has led to significant advancements, emphasizing applications in healthcare and omics data analysis. Each presentation will explore recent innovations in methodology, including novel classification techniques, robust statistical models, and their applications to real-world problems such as substance use prediction in healthcare professionals, online learning algorithms, and the analysis of multi-omic data. By showcasing these diverse yet complementary approaches, the session aims to highlight the critical role of modern statistical learning in addressing complex challenges across various domains. Attendees will gain insights into both the theoretical foundations and practical applications of these cutting-edge methods, contributing to a broader understanding of how statistical learning can enrich society in the era of AI and big data.
Applied
Yes
Main Sponsor
International Chinese Statistical Association
Co Sponsors
Biometrics Section
Section on Statistical Learning and Data Science
Presentations
We propose an empirical variational Bayesian approach to factorization of linked matrices that has several advantages over existing techniques. It has the flexibility to accommodate shared signal over any number of row or column sets (i.e., bidimensional integration), an intuitive model-based objective function that yields appropriate shrinkage for the inferred signals, and a relatively efficient estimation algorithm with no tuning parameters. A general result establishes conditions for the uniqueness of the underlying decomposition for a broad family of methods that includes the proposed approach. For scenarios with missing data, we describe an associated iterative imputation approach that is novel for the single-matrix context and a powerful approach for "blockwise" imputation (in which an entire row or column is missing) in various linked matrix contexts. he approach is applied to gene expression and miRNA data from breast cancer tissue and normal breast tissue, for which it gives an informative decomposition of variation and outperforms alternative strategies for missing data imputation.
Keywords
Data integration
Missing data imputation
Matrix completion
Low-rank factorizåtion
Variational Bayes
Dimension reduction
Cancer genomic research provides a significant opportunity to identify cancer risk-associated genes but often suffers from undesirably low statistical power due to limited sample sizes. Integrated analysis across different cancers has the potential to enhance statistical power for identifying pan-cancer risk genes. However, substantial heterogeneity among cancers makes this challenging. We developed a novel asymmetric integration method that addresses data heterogeneity and excludes uninformative datasets from the analysis. We applied this method to integrate genotype datasets with matched case and control individuals, using each cancer type as the primary dataset of interest and treating other cancers as auxiliary datasets. At the same FDR threshold, the integrated analysis identified more potential genetic variants and genes associated with cancer risk, highlighting the promise of this approach for integrating cancer datasets.
Keywords
asymmetric data integration
cancer risk-associated genetic variants and genes
Understanding the impact of treatments on different populations is a fundamental challenge in causal inference. In this talk, we introduce a novel approach for estimating the Average Treatment Effect (ATE) and establish a direct connection to Individualized Treatment Rules (ITRs) using a scale-space matching framework. Our method refines treatment effect estimation by capturing variations across scales, enabling a more flexible and robust analysis of heterogeneous treatment effects. Through a series of simulations and real-world examples, we illustrate the advantages of our method in comparison to existing techniques. This work provides a new perspective on bridging global and personalized treatment effects, offering practical insights for data-driven decision-making in healthcare, policy evaluation, and other applied domains.
In many social, behavioral, and biomedical sciences, treatment effect estimation is a crucial step in understanding the impact of an intervention, policy, or treatment. In recent years, an increasing emphasis has been placed on heterogeneity in treatment effects, leading to the development of various methods for estimating Conditional Average Treatment Effects (CATE). These approaches hinge on a crucial identifying condition of no unmeasured confounding, an assumption that is not always guaranteed in observational studies or randomized control trials with non-compliance. In this paper, we proposed a general framework for estimating CATE with a possible unmeasured confounder using Instrumental Variables. We also construct estimators that exhibit greater efficiency and robustness against various scenarios of model misspecification. The efficacy of the proposed framework is demonstrated through simulation studies and a real data example.
Keywords
Heterogeneous Treatment Effect
Instrumental Variables
Recent advances in causal inference have shifted the focus from estimating average treatment effects to individual treatment effects (ITE). We propose a novel Structure Maintained Representation Learning (SMRL) approach to improve ITE estimation by preserving the correlation between baseline covariates and their learned representations. Our method introduces a discriminator to balance distributional alignment and information retention, minimizing an upper bound on treatment estimation error. We demonstrate SMRL's superiority over existing methods through extensive experiments on both simulated and real-world datasets, including EHR data from the MIMIC-III database.
Keywords
causal inference