Advanced Research in High Dimensional Regression

Danh Nguyen Chair
University of California-Irvine
 
Wednesday, Aug 6: 8:30 AM - 10:20 AM
4140 
Contributed Papers 
Music City Center 
Room: CC-103A 
This session will have presenters showing latest research in high dimensional regression ranging from new methods for handling longitudinal data, diurnal data, spatial data with high dimensional covariates, etc..

Main Sponsor

Biometrics Section

Presentations

WITHDRAWN: Debiased network constrained sparse group lasso with applications to high-dimension longitudinal omics data

There is a growing interest in longitudinal omics data paired with some longitudinal clinical outcome. Given a large set of continuous omics variables and some continuous clinical outcome, each measured for a few subjects at only a few time points, we seek to identify those variables that co-vary over time with the outcome in one or more treatment groups. To motivate this problem we study a dataset with hundreds of urinary metabolites along with Tuberculosis mycobacterial load as our clinical outcome, with the objective of identifying potential biomarkers for disease progression in two treatment groups. For such data clinicians usually apply simple linear mixed effects models which often lack power given the low number of replicates and time points. Our previous method, PROLONG, combines group lasso and network Laplacian penalties on first-differenced data, increasing power and utilizing the variance across both time and omics features. We extend this PROLONG model to multiple treatment groups by debiasing the group lasso + laplacian model and performing inference on the debiased estimator. 

Keywords

Omics

High Dimensional

Metabolomics

Regression 

Co-Author(s)

Martin Wells, Cornell University
Sumanta Basu, Cornell University Department of Statistics and Data Science
Myung Hee Lee, Weill Cornell Medicine

First Author

Steven Broll, Cornell University

Doubly regularized generalized linear models for spatial data with high-dimensional covariates

A discrete spatial lattice can be cast as a network structure over which spatially-correlated outcomes are observed. A second network structure may also capture similarities among measured features, when such information is available. Incorporating the network structures when analyzing such doubly-structured data can improve predictive power, and lead to better identification of important features in the data-generating process. Motivated by applications in spatial disease mapping, we develop a new doubly regularized regression framework to incorporate these network structures for analyzing high-dimensional datasets. Our estimators can be easily implemented with standard convex optimization algorithms. In addition, we describe a procedure to obtain asymptotically valid confidence intervals and hypothesis tests for our model parameters. We show empirically that our framework provides improved predictive accuracy and inferential power compared to existing high-dimensional spatial methods. These advantages hold given fully accurate network information, and also with networks which are partially misspecified or uninformative. 

Keywords

high-dimensional data

penalized regression

spatial data

networks 

Co-Author(s)

Si Cheng, University of Washington
Ali Shojaie, University of Washington

First Author

Arjun Sondhi

Presenting Author

Arjun Sondhi

Flexible Modeling Framework for Self-Exciting Diurnal Processes with Applications to Smartphone Use

Improving strength of routine is a target for many therapies and treatments of mood and affective disorders. Smartphone usage data enables us to model person-specific diurnal patterns of usage that provide useful insight into a person's routine and behavior. Considering phone usage as a point process, existing approaches focus on capturing self-exciting behavior, the phenomenon where the rate of usage is heightened during and immediately after using one's phone. While this self-exciting phenomenon is important, there are limited methods that also allow for flexible modeling of diurnal effects on the rate of smartphone usage. We propose a framework that can combine the self-exciting Hawkes process with a penalized Fourier series to capture important diurnal trends. Through simulation experiments and an application to a cohort of patients with affective disorders, we show the benefit of models that account for self-exciting and diurnal patterns concurrently. 

Keywords

mobile health

longitudinal and correlated data

point processes

diurnal patterns

event data

mental health 

Co-Author

Ian Barnett, University of Pennsylvania

First Author

Ryan Xie

Presenting Author

Ryan Xie

Functional dynamic models with functional and scalar predictors prone to measurement errors.

Extensive literature explores the modeling of dynamic and functional responses using functional regression approaches that apply smoothing techniques to capture complex data trends in functional covariates and time-varying scalar predictors. However, there has been relatively less focus on understanding how longitudinal and functional predictors prone to measurement errors influence dynamic functional outcomes. Addressing this gap, we propose a functional dynamic modeling framework that accounts for measurement errors in both functional and scalar predictors. This approach aims to enhance our understanding of how self-reported mealtimes, which serve as longitudinal measures, influence glycemic dynamics over time. Additionally, our model incorporates actigraphy-measured physical activity, which is prone to measurement errors, to provide a more comprehensive analysis. Finite sample properties were established through simulations. We applied the methods to data from a prospective cohort study of 277 healthy pregnant women to determine optimal meal timing and its association with dynamic glycemic outcomes in pregnancy. 

Keywords

Functional data

Glycemic dynamics

Measurement Error

Optimal Meal Timing

Physical Activity

Meal Type 

Co-Author(s)

Roger S Zoh, Indiana University
Xue Lan, Oregon State University
See Ling Loy, Duke-NUS Medical School
Carmen Tekwe, Indiana University

First Author

Mercy Oladuti

Presenting Author

Mercy Oladuti

Inference on the Significance of Modalities in Multimodal Generalized Linear Models

Multimodal statistical models have gained much attention in recent years, yet there lacks rigorous statistical inference tools for inferring the significance of a single modality within a multimodal model. This inference problem is particularly challenging in high-dimensional multimodal models. In high-dimensional multimodal generalized linear models, we propose a novel entropy-based metric, called the Expected Relative Entropy (ERE), to quantify the information gain of one modality in addition to all other modalities in the model. We then propose a deviance-based statistic to estimate the ERE. We prove that the deviance-based statistic is consistent with the ERE and derive its asymptotic distribution, which enables the calculation of confidence intervals and p-values to assess the significance of a given modality. We numerically evaluate the empirical performance of our proposed inference tool on various high-dimensional multimodal generalized linear models and demonstrate its good performance. We also apply our method to a multimodal neuroimaging dataset to demonstrate its capability to infer the significance of imaging modalities, which is crucial for neuroscience studies. 

Keywords

High-dimensional inference

Multimodal data,

Relative Entropy

Sure Independence Screening 

Co-Author(s)

Quefeng Li, University of North Carolina Chapel Hill
Guorong Wu, UNC

First Author

Wanting Jin

Presenting Author

Wanting Jin

Targeted Learning of Heterogeneous Sources by Informative Feature Sharing

Transfer learning has been proven useful for leveraging information from multiple similar source datasets to enhance the performance of the target model. A fundamental challenge in transfer learning is avoiding negative transfer when there is heterogeneity among the sources and between the source and target datasets. Traditional methods are typically based on identifying informative sources. This creates a binary all-in or all-out decision, potentially resulting in the loss of useful information. In this paper, we introduce Targeted-IFS, a new transfer learning framework for high-dimensional Generalized Linear Models (GLMs) under heterogeneous sources. To avoid negative transfer and ensure effective transfer of useful information from sources, Targeted-IFS employs a pre-transfer debiasing step to correct estimates of selected informative features across all sources, rather than selecting the informative sources. We theoretically show that the Targeted-IFS method avoids negative transfer, achieving a convergence rate no worse than the classical LASSO using only target data, regardless of source heterogeneity. Simulations confirm its robustness to complex source heterogeneity and imp 

Keywords

Generalized linear model

heterogeneity

informative support

negative transfer

robust transfer learning 

Co-Author(s)

Yudong Wang, University of Pennsylvania, Perelman School of Medicine
Tingyin Wang
Yumou Qiu, Peking University
Yang Ning, Cornell University
Yong Chen, University of Pennsylvania, Perelman School of Medicine

First Author

Jie Hu, University of Pennsylvania

Presenting Author

Jie Hu, University of Pennsylvania

Tractable Conditional Density Estimation using Logistic Gaussian Process

Conditional density estimation in high dimensional data has been studied extensively inrecent times. In this talk, we propose a model to estimate the conditional density of responses which varies spatially given a covariate vector and a specific location. By utilizing a variation of logspline models, we nonparametrically approximate the unknown link using a triangular basis expansion and assuming a Gaussian prior on the coefficients. We show that the posterior contracts to the true density at a minimax optimal (upto a logarithmic constant) rate. We evaluate the performance of our method with numerous simulations, and compare the results with related high dimensional density estimation techniques. We illustrate our method on a summary measure, namely, the Fractional Anisotropy, collected from 213 subjects at 83 brain locations in a dataset generated by the Alzheimer's Disease Neuroimaging Initiative to identify the functional relationship between the various covariatesand the response with the various locations. 

Keywords

Density estimation

Posterior Contraction

Spline

Scalable approximation 

Co-Author(s)

Debdeep Pati, University of Wisconsin-Madison
Jaehoan Kim, Duke University

First Author

Indrajit Ghosh, Texas AM University

Presenting Author

Dipankar Bandyopadhyay, Virginia Commonwealth University