KISS Student Paper Competition Award Section

Youjin Lee Chair
 
Youjin Lee Organizer
 
Tuesday, Aug 5: 2:00 PM - 3:50 PM
0706 
Topic-Contributed Paper Session 
Music City Center 
Room: CC-103C 
This session features five student paper competition awardees. The presentations cover recent developments in statistical methods for high-dimensional and censored data, deep generative models, dimension reduction techniques, and nonparametric regression methods.

Keywords

Student Paper Competition Award 

Applied

Yes

Main Sponsor

Korean International Statistical Society

Presentations

A Scalable Variational Bayes Approach to Fit High-dimensional Spatial Generalized Linear Mixed Models

Gaussian and discrete non-Gaussian spatial datasets are common across fields like public health, ecology, geosciences, and social sciences. Bayesian spatial generalized linear mixed models (SGLMMs) are a flexible class of models for analyzing such data, but they struggle to scale to large datasets. Many scalable Bayesian methods, built upon basis representations or sparse covariance matrices, still rely on posterior sampling via Markov chain Monte Carlo (MCMC). Variational Bayes (VB) methods have been applied to SGLMMs, but only for small areal datasets. We propose two computationally efficient VB approaches for analyzing moderately sized and massive (millions of locations) Gaussian and discrete non-Gaussian spatial data in the continuous spatial domain. Our methods leverage semi-parametric approximations of latent spatial processes and parallel computing to ensure computational efficiency. The proposed methods deliver inferential and predictive performance comparable to gold-standard MCMC methods while achieving computational speedups of up to 3600 times. In most cases, our VB approaches outperform state-of-the-art alternatives such as INLA and Hamiltonian Monte Carlo. We validate our methods through a comparative numerical study and applications to real-world datasets. These VB approaches can enable practitioners to model millions of discrete non-Gaussian spatial observations on standard laptops, significantly expanding access to advanced spatial modeling tools. 

Keywords

Spatial Statistics

Variational Inference

Basis Representation

Parallel Computing

Non-Gaussian Spatial Data

Statistical Computing 

Speaker

Jin Hyung Lee, George Mason University

Adaptive Quantile Regression for Doubly-Censored Data

Time-to-event data with a "double-censoring" structure, which includes exact observations along with left- and right-censored samples, frequently arise in biomedical and epidemiological studies. In this article, we present two efficient iterative algorithms for calculating regression quantiles in doubly-censored data settings: the Iterative Quantile Search Method (IQSM) and the Adaptive Quantile Loss via MM Algorithm (AQMM). To address this complex data structure, we develop an unbiased estimating function and the corresponding adaptive quantile loss function. This approach leverages the fact that the survival probability of the observed event time at a given quantile is a quantile-weighted average of the survival probabilities of the left- and right-censoring variables. We then reformulate the quantile loss function for doubly-censored data within the standard framework of weighted quantile regression, significantly simplifying computational requirements. The proposed estimators are shown to be consistent and asymptotically normal, ensuring robust theoretical properties. Extensive numerical studies validate the finite-sample performance of our methods, highlighting their efficiency and unbiasedness. Finally, we illustrate the practical utility of the proposed methods by analyzing the heterogeneous effects of various factors on the recovery time of COVID-19 patients. 

Keywords

Adaptive quantile loss

Censored quantile regression

Double-censoring

Majorization-Minimization algorithm

Survival analysis 

Co-Author(s)

Yeji Kim, New York University, Division of Biotatistics, Department of Population Health
Sangbum Choi, Korea University, Department of Statistics
Seohyeon Park, Department of Statistics, Korea University

Speaker

Sangbum Choi, Korea University

Deep Discrete Encoders for Identifiable Generative Modeling

In the era of generative AI, deep generative models (DGMs) with latent representations have gained tremendous popularity. Despite their impressive empirical performance, the statistical properties of these models remain underexplored. DGMs are often overparametrized, non-identifiable, and uninterpretable black boxes, raising serious concerns when deploying them in high-stakes applications. Motivated by this, we propose an interpretable deep generative modeling framework for rich data types with discrete latent layers, called Deep Discrete Encoders (DDEs). A DDE is a directed graphical model with multiple binary latent layers. Theoretically, we propose transparent identifiability conditions for DDEs, which imply progressively smaller sizes of the latent layers as they go deeper. Identifiability ensures consistent parameter estimation and inspires an interpretable design of the deep architecture. Computationally, we propose a scalable estimation pipeline of a layerwise nonlinear spectral initialization followed by a penalized stochastic approximation EM algorithm. This procedure can efficiently estimate models with exponentially many latent components. Extensive simulation studies validate our theoretical results and demonstrate the proposed algorithms' excellent performance. We apply DDEs to three diverse real datasets for hierarchical topic modeling, image representation learning, response time modeling in educational testing, and obtain interpretable findings. 

Keywords

Identifiability

Interpretable AI

Graphical models 

Co-Author

Yuqi Gu, Columbia University

Speaker

Seunghyun Lee

Principal Component Analysis in the Graph Frequency Domain

We propose a novel principal component analysis in the graph frequency domain for dimension reduction of multivariate data residing on graphs. The proposed method not only effectively reduces the dimensionality of multivariate graph signals, but also provides a closed-form reconstruction of the original data. In addition, we investigate several propositions related to principal components and the reconstruction errors, and introduce a graph spectral envelope that aids in identifying common graph frequencies in multivariate graph signals. We demonstrate the validity of the proposed method through a simulation study and further analyze the boarding and alighting patterns of Seoul Metropolitan Subway passengers using the proposed method. 

Keywords

Dimension reduction

Graph frequency domain

Graph signal processing

Multivariate graph signal

PCA 

Co-Author

Hee-Seok Oh, Seoul National University

Speaker

Kyusoon Kim, Seoul National University, Department of Statistics

Totally Concave Regression

Shape constraints offer compelling advantages in nonparametric regression by enabling the estimation of regression functions under realistic assumptions, devoid of tuning parameters. However, most existing shape-constrained nonparametric regression methods, except additive models, impose too few restrictions on the regression functions. This often leads to suboptimal performance, such as overfitting, in multivariate contexts due to the curse of dimensionality. On the other hand, additive shape-constrained models are sometimes too restrictive because they fail to capture interactions among the covariates. In this paper, we introduce a novel approach for multivariate shape-constrained nonparametric regression, which allows interactions without suffering from the curse of dimensionality. Our approach is based on the notion of total concavity originally due to T. Popoviciu. We discuss the characterization and computation of the least squares estimator over the class of totally concave functions and derive rates of convergence under standard assumptions. The rates of convergence depend on the number of covariates only logarithmically, and the estimator, therefore, is guaranteed to avoid the curse of dimensionality to some extent. We demonstrate that total concavity can be justified for many real-world examples and validate the efficacy of our approach through empirical studies on various datasets. 

Keywords

Interaction effect modeling

Mixed partial derivatives,

Multivariate convex regression

Nonnegative least squares

Popoviciu's convex functions

Shape-constrained estimation 

Co-Author

Adityanand Guntuboyina, University of California, Berkeley

Speaker

Dohyeong Ki, Department of Statistics, University of California, Berkeley