Tuesday, Aug 5: 8:30 AM - 10:20 AM
4086
Contributed Papers
Music City Center
Room: CC-104D
Main Sponsor
Section on Statistical Computing
Presentations
L1 penalized quantile regression (PQR) is used in many fields as an alternative to penalized least squares regressions for data analysis. Existing algorithms for PQR either use linear programming, which does not scale well in high dimension, or an approximate coordinate descent (CD) which does not solve for exact coordinatewise minimum of the nonsmooth loss function. Further, neither approaches leverage sparsity structure of the problem in large-scale datasets. To avoid the computational challenges associated with the nonsmooth quantile loss, some recent works have even advocated using smooth approximations to the exact problem. In this work, we develop a fast, pathwise CD algorithm to compute exact L1 PQR estimates for all dimensional data. We derive an easy-to-compute exact solution for the coordinatewise nonsmooth loss minimization, which, to the best of our knowledge, has not been reported in the literature. We also employ a random perturbation to help the algorithm avoid getting stuck along the regularization path. In simulated and real world datasets, we show that our algorithm runs substantially faster than existing alternatives, while retaining the same level of estimation accuracy.
Keywords
LASSO
penalized quantile regression
coordinate descent
pathwise algorithm
Dimension reduction techniques play a significant role in analyzing high-dimensional data, especially in fields like radiomics, where extracting meaningful patterns from complex datasets is essential. This study evaluates the performance of Principal Component Analysis (PCA), Isomap, and t-Distributed Stochastic Neighbor Embedding (t-SNE) in preserving data structure based on average silhouette scores. Through extensive simulations, we compare these methods across datasets with varying sample sizes (n = 100, 200, 300, 400, 500), noise levels (σ² = 0.25, 0.5, 0.75, 1, 1.5, 2), and feature counts (p = 20, 50, 100, 200, 300, 400). Our findings indicate that for datasets with an underlying linear structure, PCA achieves the highest accuracy in maintaining cluster integrity, as measured by the average silhouette score. Conversely, for nonlinear data structures, Isomap and t-SNE outperform PCA in preserving meaningful relationships.
One important application of these findings is in radiomics, where high-dimensional imaging data is used to extract quantitative biomarkers for cancer diagnosis and prognosis.
Keywords
Dimension Reductions Techniques
Linear and Nonlinear Data Structures
Radiomics
Principal Component Analysis (PCA)
Isomap
t-Distributed Stochastic Neighbor Embedding (t-SNE)
The need to model data with higher dimensions, such as a tensor-variate framework
where each observation is considered a three-dimensional object, increases
due to rapid improvements in computational power and data storage capabilities.
In this study, a finite mixture of hidden Markov model for tensor-variate time
series data is developed. Simulation studies demonstrate high classification accuracy
for both cluster and regime IDs. To further validate the usefulness of the
proposed model, it is applied to real-life data with promising results.
Keywords
Finite Mixture model
Hidden Markov model
Forward-backward algorithm
tensor-variate time series
We aim to estimate and conduct inference for the effects of multiple covariates of interest simultaneously, after adjusting for the effects of high-dimensional control variables under a multivariate linear model setting. A chi-square statistic is proposed, based on the residuals obtained from fitting the response variables and the target covariates to the control covariates via regularized estimation. Procedures for hypothesis testing and confidence interval construction are developed. The proposed procedures mitigate the potential overfitting errors from regularized estimation on the inference of the target parameters and account for the inherent interconnectivity between response variables.
Keywords
High dimensional Inference
Multivariate
Hotelling-Lawley trace
Principal components computed via PCA are traditionally used to reduce dimensionality in genomic data or correct for population stratification. In this statistical paper, we explore the penalized eigenvalue problem (PEP), which reformulates the first eigenvector computation as an optimization problem, adding an L1 penalty to enforce sparsity. In our threefold contribution, we first extend PEP by applying Nesterov smoothing to the LASSO-type L1 penalty, enabling analytical gradient computation for faster, more efficient objective function minimization. Second, we illustrate how higher order eigenvectors can be computed with PEP using established SVD results. Third, we present experimental studies exhibiting the utility of smoothed penalized eigenvectors compared to other state-of-the-art methods. Using 1000 Genomes Project data, we empirically show that our smoothed PEP improves numerical stability and yields meaningful eigenvectors. We employ the PEP approach in further real data applications (polygenic risk score computation and clustering), demonstrating that exchanging the penalized eigenvectors for smoothed counterparts enhances prediction accuracy and cluster discernibility.
Keywords
Principal Component Analysis
Eigenvector
Smoothing
Genomic Relationship Matrix
Singular Value Decomposition
Nesterov
Deep neural networks (DNNs) have been widely used for real-world regression tasks, but applying them to high-dimensional, low-sample-size data presents unique challenges. Existing approaches often prioritize sparse linear relationships before extending to the full DNN structure, which can overlook important nonlinear associations. The problem becomes even more complex when selecting network architecture, such as determining the optimal number of layers and neurons. This study addresses these challenges by linking neuron selection in DNNs to knot placement in basis expansion techniques and additive modeling with introducing a sparsity-inducing difference penalty. This penalty automates knot selection and promotes parsimony in neuron activations, resulting in an efficient and scalable fitting method with optimizing architecture selection. The proposed method, named by Sparse Deep P-Spline, is validated through numerical studies, demonstrating its ability to efficiently detect sparse nonlinear relationships. Applications to the analysis of computer experiments are also presented.
Keywords
Deep Smoothing Regression
Additive Models
Feature Selection
Fast Tuning Algorithm
In this work, we develop a novel variational inference framework for a regularized multivariate regression model that integrates latent clustering with advanced low-rank regression techniques. We demonstrate the utility of our method through simulation studies and an application to county-level COVID-19 outcomes, the Social Vulnerability Index (SVI), and non-pharmaceutical interventions (NPIs) in Florida. Our experiments show that the proposed framework not only enhances model flexibility and computational scalability but also offers valuable insights for targeted interventions, particularly in identifying vulnerable groups.
Keywords
Low-Rank Regression
Variational Inference
Social Vulnerability