Navigating High-Dimensional Landscapes: Innovations in Model Estimation and Predictive Inference

Abstract Number:

1669 

Submission Type:

Topic-Contributed Paper Session 

Participants:

Luca Sartore (1), David Matteson (3), Valbona Bejleri (4), Piaomu Liu (2), Luca Sartore (1), Ivy Zhang (5), Johannes Bleher (6), Aliaksandr Hubin (7)

Institutions:

(1) National Institute of Statistical Sciences, N/A, (2) Bentley University, N/A, (3) Cornell University & National Institute of Statistical Sciences, N/A, (4) United States Department of Agriculture – National Agricultural Statistics Service, N/A, (5) N/A, N/A, (6) University of Hohenheim, N/A, (7) University of Oslo, N/A

Chair:

Piaomu Liu  
Bentley University

Co-Organizer:

David Matteson  
Cornell University & National Institute of Statistical Sciences

Discussant:

Valbona Bejleri  
United States Department of Agriculture – National Agricultural Statistics Service

Session Organizer:

Luca Sartore  
National Institute of Statistical Sciences

Speaker(s):

Luca Sartore  
National Institute of Statistical Sciences
Ivy Zhang  
N/A
Johannes Bleher  
University of Hohenheim
Aliaksandr Hubin  
University of Oslo

Session Description:

Traditional regression approaches are not suitable for analyzing high-dimensional data sets. Recent advances in big-data analytics have enabled the sparse selection of informative variables to enhance the interpretability and predictive accuracy of models for high-dimensional data. However, several challenges in high-dimensional spaces remain unaddressed in the statistical literature. For example, from a frequentist perspective, model selection and its properties are not fully studied in capture-recapture contexts or when dealing with data from heterogeneous domains. From a Bayesian perspective, however, approaches to modeling high-dimensional data sets focus on stochastic variable selection, adaptive shrinkage, or model averaging. Nevertheless, current state-of-the-art Bayesian methods are not fully equipped to simultaneously handle hierarchical population structures, heteroscedastic designs, various missing data mechanisms, and different levels of missingness. Addressing these challenges requires the development of new methods that improve computational efficiency relative to existing techniques. These innovations are crucial for advancements in various fields such as econometrics, healthcare, and social sciences. Overall, this section presents diverse perspectives to advance high-dimensional analytics, providing reliable and effective alternatives for statistical practitioners.

Luca Sartore from the National Institute of Statistical Sciences will begin the session with an advanced variable selection method designed for the US Census of Agriculture. He will highlight iterative approaches for the initialization and successive optimization of model parameters in high-dimensional settings. Ivy Yuexin Zhang from Stanford University will present a delta-invariant method for feature selection, addressing the challenges of retrieving a stable signal in high-dimensional heterogeneous domains. Johannes Bleher from Hohenheim University will discuss a probabilistic procedure for variable selection when missing covariate data are handled through multiple imputations. He will evaluate his procedure through a Monte Carlo study under several missing data mechanisms and demonstrate its application using survey data. Aliaksandr Hubin from Oslo University will introduce the concept of active paths for accurately identifying true covariates in high-dimensional non-linear systems. He will offer a novel perspective on a sparse representation of latent binary Bayesian neural networks to identify over-parameterized models. Finally, Valbona Bejleri from the United States Department of Agriculture's National Agricultural Statistics Service will conclude the session as a discussant. She will summarize the innovations in high-dimensional methods, highlighting future research directions and opportunities for collaboration among statisticians from various backgrounds.

Sponsors:

Biometrics Section 2
Government Statistics Section 3
Section on Statistical Computing 1

Theme: Communities in Action: Advancing Society

Yes

Applied

Yes

Estimated Audience Size

Large (150-275)

I have read and understand that JSM participants must abide by the Participant Guidelines.

Yes

I understand and have communicated to my proposed speakers that JSM participants must register and pay the appropriate registration fee by June 1, 2026. The registration fee is nonrefundable.

I understand