Thursday, Aug 7: 10:30 AM - 12:20 PM
4231
Contributed Papers
Music City Center
Room: CC-207C
Main Sponsor
Section on Statistical Learning and Data Science
Presentations
Motivated by the increasing demand for multi-source data integration in various scientific fields, in this paper we study matrix completion in scenarios where the data exhibits certain block-wise missing structures -- specifically, where only a few noisy submatrices representing (overlapping) parts of the full matrix are available. We propose the Chain-linked Multiple Matrix Integration (CMMI) procedure to efficiently combine the information that can be extracted from these individual noisy submatrices. CMMI begins by deriving entity low-rank embeddings for each observed submatrix, then aligns these embeddings using overlapping entities between pairs of submatrices, and finally aggregates them to reconstruct the entire matrix of interest. We establish, under mild regularity conditions, entrywise error bounds and normal approximations for the CMMI estimates. Simulation studies and real data applications show that CMMI is computationally efficient and effective in recovering the full matrix, even when overlaps between the observed submatrices are minimal.
Keywords
2→∞ norm
normal approximations
matrix completion
data integration
In this work, we propose a new semi-supervised method for multiple quantile regression method . Traditional multiple quantile regression methods often have the problem of quantile crossing, where a lower quantile estimate ends up being higher than a larger quantile estimate. To address this, we introduce a non-crossing penalty term that enforces the natural ordering of quantiles. Our framework natural allows for regularization of the regression coefficient matrix. To compute our estimator, we utilize a splitting algorithm. In simulation studies, we show that our method can lead to improved performance over existing estimators.
Keywords
Alternating direction method of multipliers
Constrained optimization
Quantile regression
Dimension reduction
As tensors become widespread in modern data analysis, Tucker low-rank Principal Component Analysis (PCA) has become essential for dimensionality reduction and structural discovery in tensor datasets. Motivated by the common scenario where large-scale tensors are distributed across diverse geographic locations, this paper investigates tensor PCA within a distributed framework where direct data pooling is theoretically suboptimal or practically infeasible. We offer a comprehensive analysis of three specific scenarios in distributed Tensor PCA: a homogeneous setting in which tensors at various locations are generated from a single noise-affected model; a heterogeneous setting where tensors at different locations come from distinct models but share some principal components, aiming to improve estimation across all locations; and a targeted heterogeneous setting, designed to boost estimation accuracy at a specific location with limited samples by utilizing transferred knowledge from other sites with ample data. We introduce novel estimation methods tailored to each scenario, establish statistical guarantees, and develop distributed inference techniques to construct confidence regions.
Keywords
Tensor Principal Component Analysis
Distributed Inference
Data Heterogeneity
Communication Efficiency
Tucker Decomposition
This paper studies fine-grained singular subspace inference in the matrix denoising model where a deterministic low-rank signal matrix is additively perturbed by a stochastic matrix of independent Gaussian noise. We establish that the maximum Euclidean row norm of the aligned difference between the top-$r$ sample and population singular vector matrices approaches the Gumbel distribution in the large-matrix limit under suitable signal-to-noise conditions after appropriate centering and scaling. Our main results are obtained by a novel synthesis of entrywise matrix perturbation theory and saddle point approximation methods in statistics. The theoretical developments in this paper lead to methodology for hypothesis testing low-rank signal structure encoded in the singular subspaces spanned by the top-$r$ singular vectors. To develop a data-driven inference procedure, shrinkage-type de-biased estimators are derived for the signal singular values. The features of our test include an asymptotic control over the size, and a power phase transition analysis under simple alternative structures.
Keywords
Singular subspace inference
Two-to-infinity norm
Gumbel convergence
Saddle point approximation
Singular value shrinkage
Layered models like neural networks appear to extract key features from data through empirical risk minimization, yet the theoretical understanding for this process remains unclear. Motivated by these observations, we study a two-layer nonparametric regression model where the input undergoes a linear transformation followed by a nonlinear mapping to predict the output, mir- roring the structure of two-layer neural networks. In our model, both layers are optimized jointly through empirical risk minimization, with the nonlinear layer modeled by a reproducing kernel Hilbert space induced by a rotation and translation invariant kernel, regularized by a ridge penalty.
Our main result shows that the two-layer model can "automatically'' induce regularization and facilitate feature learning. Specifically, the two-layer model promotes dimensionality reduction in the linear layer and identifies a parsimonious subspace of relevant features-even without applying any norm penalty on the linear layer. Notably, this regularization effect arises directly from the model's layered structure. Real-world data experiments further demonstrate the persistence of this phenomenon in practice.
Keywords
layered models
regularization
feature learning
central mean subspace
reproducing kernel Hilbert space
ridge regression
We propose a framework for dimension reduction in high-dimensional regression, by aggregating an ensemble of random projections selected based on empirical regression performance. Specifically, we consider disjoint groups of independent random projections, apply a base regression method after each projection is appied to the covariates, and retain the best-performing projection in each group. The selected projections are aggregated by taking the SVD of their empirical average, yielding the leading singular vectors. Notably, the singular values indicate the importance of the corresponding projection directions, aiding in selecting the final projection dimension. We provide recommendations on aspects of our framework, including the projection distribution, base regression method, and the number of random projections. Additionally, we explore further dimension reduction by applying our algorithm twice when the initially recommended dimension is too large. Our theoretical results show that the error of algorithm stabilises as the number of projection groups increases. We demonstrate our proposal's strong empirical performance through an extensive study using simulated and real data.
Keywords
High-dimensional
mean central subspace
random projection
singular value decomposition
sufficient dimension reduction
Biclustering is an unsupervised machine-learning technique that simultaneously clusters rows and columns in a data matrix. It has been gaining increasing attention over the past two decades driven by the increasing complexity and volume of data in fields like genomics, transcriptomics, and other high-throughput omics technologies. However, discovering significant bi-clusters in large-scale datasets is an NP-hard problem. The accuracy and stabilities of most existing biclustering algorithms decrease significantly as dataset size increases. That is mainly due to accumulation of noise in high dimension features and their non-convex optimization formulations. To address this, we propose a new method called sparse convex biclustering (SCB), which penalizes the noise to zero in the process of biclustering. A tuning criterion based on clustering stability is developed to optimally balance cluster fitting and sparsity. We conduct comprehensive numerical studies using simulated data to demonstrate the superior performance of SCB in comparison to several state-of-the-art alternatives. Furthermore, we apply our method to the analysis of mouse olfactory bulb (MOB) data.
Keywords
Convex biclustering
Sparsity
ADMM
High-dimensional data