Thursday, Aug 6: 10:30 AM - 12:20 PM
6264
Contributed Papers
Main Sponsor
Section on Statistical Learning and Data Science
Presentations
Despite remarkable advances in neuroimaging, current analytical frameworks still face challenges in achieving two essential goals: clinical interpretability and computational efficiency, particularly when handling high-dimensional brain data. Although 2D approaches remain widely used, slice-by-slice analysis often fails to capture volumetric continuity, limiting the detection of subtle abnormalities across slices. Conversely, fully 3D CNN-based models demand excessive computation and memory. To overcome these limitations, we propose a 3D Adaptive Spatial Key-Region Identification (ASKRI) method that achieves both interpretability and efficiency. In this framework, key regions are adaptively enhanced within a Restricted Adjacency-Dependent Mixture Dirichlet Process model, improving interpretability while supporting clinical diagnostics. Applied to brain imaging, the method not only identifies key regions(e.g. fornix) with high classification accuracy but also isolates clinically meaningful and diagnostically informative ROIs, thereby providing a time-efficient and reliable tool for neuroimaging analysis.
Keywords
3D Convolutional Neural Networks
Adaptive Spatial Key-Region Identification (ASKRI)
Universal Kriging
Restricted Adjacency Matrix
Diffusion Tensor Imaging (DTI)
In many applications, weighted networks are constructed based on time series data. A time series is associated to each vertex, and edge weights are given by correlations between times series. This results in dependency among the edges, violating the assumptions of most common network models. Nonetheless, it is common to apply network embedding methods to networks built from correlation data. In this work, we show that this violation of assumptions is not critical. Provided that the time series under study are expressible in terms of a small number of orthogonal sequences, the adjacency spectral embedding provably recovers the true time series. That is, the adjacency spectral embedding applied to correlation networks serves as a denoising process, analogous to principal components analysis. In addition, we show that under suitable sparsity assumptions on the frequency domain, the embedding learned the adjacency spectral embedding recovers the Fourier coefficients of the true signals. This fact appears to be folklore in the signal processing community in the context of principle component analysis, but it is, to the best of our knowledge, new to the networks literature.
Keywords
Networks
Embeddings
Correlation matrix
Spectral methods
Time series
We introduce a general inferential framework for comparing predictor importance in classification models with categorical responses. Our approach is based on the categorical Gini correlation (CGC), a dependence measure between numerical and categorical variables that captures the significance of a predictor for the response. To compare the importance of two predictors with respect to the same categorical outcome, we conduct hypothesis tests on their CGCs. The framework accommodates predictors of arbitrary and unequal dimensionalities. We derive the asymptotic distribution of the test statistic for hypothesis testing and show that the test is consistent. In addition, we propose a nonparametric bootstrap procedure as an alternative to the asymptotic normal-based test. Simulation studies demonstrate the empirical performance of the proposed tests, and applications to two real datasets illustrate their practical utility.
Keywords
categorical Gini correlation
comparing correlations
classification
Predictor importance
Categorical response
Nonparametric bootstrap
Calculation of the Chained CPI-U requires monthly item-area expenditure shares from the same time-period as item-area price index relatives. Yet, cell-level expenditure data only become available four quarters after the price data. In the interim, the BLS issues a preliminary estimate of the index using a Constant Elasticity of Substitution model. We propose an alternative method for preliminary estimation that instead retains the Tornqvist formula for aggregation and forecasts the missing item-area expenditure data using a set of hierarchical Echo State Networks (ESNs), a class of Recurrent Neural Networks in which reservoir and input couplings are randomized.
ESNs are flexible, nonlinear, hidden variable models that can predict series with complex temporal dynamics after a relatively simple training process. We develop an iterative procedure to forecast a vector of item expenditures for a given area based on its past expenditure data as well as past and concurrent price data. Additionally, we include the option to supplement the ESN neuron states with discrete Fourier modes at the seasonal frequencies to improve prediction among items with strong seasonal components.
Keywords
Time series
Price indices
Echo State Networks
Neural Networks
Traditional classification methods like k-nearest neighbors (kNN) are widely used in practical applications and have demonstrated effectiveness under the assumption of well-observed networks with known labels. However, in practice, networks are frequently not fully observed due to anonymization, data collection inaccuracies, or missing information, resulting in estimated or entirely unknown node labels. This lack of information could compromise statistical inference if methods heavily rely on label-specific attributes. We investigate the impact of node shuffling on classification performance within a Stochastic Block Model framework. Specifically, we use kNN combined with Procrustes alignment of latent positions to classify graphs from two groups differing by a perturbation. Our empirical and theoretical results reveal that in the homogeneous case, the classification rate declines with increasing shuffled vertices. However, for a large enough perturbation, a change point occurs at which the classification rate resurges. Notably, a reflection is observed in the Procrustes alignment at this point, which becomes more pronounced with increasing perturbation.
Keywords
Stochastic Block Model (SBM)
Node shuffling
k-Nearest Neighbors (kNN)
Graph Classification
Procrustes Alignment
Adjacency Spectral Embedding (ASE)
Gaussian graphical models in spectral domain provide a principled framework for identifying conditional dependence structures in stationary high-dimensional time series. Inference for the spectral precision matrix (SPM) at fixed frequency is challenging because estimation requires smoothing across frequencies, while spectral-domain observations, i.e. discrete Fourier transforms, are only asymptotically independent, have non-sparse precision matrices, and exhibit finite-sample biases that invalidate standard i.i.d. precision matrix inference. We propose an inference framework for sparse high-dimensional SPMs. Our method constructs a debiased complex graphical lasso (deCGLASSO) estimator at a specified frequency. Using asymptotic theory for quadratic forms of stationary multivariate time series, we establish asymptotic normality of the debiased estimator. For each matrix entry, we develop an estimator of the asymptotic covariance by aggregating information across neighboring frequencies. The key theoretical contribution is explicit control of the regularization, truncation bias and smoothing bias. We demonstrate the method's empirical performance on simulated data and real fMRI data.
Keywords
Graphical models
Precision matrix estimation
High-dimensional time series
Spectral domain inference
Debiased estimators
Confidence intervals
Given multiple data matrices, many problems in statistics and data science rely on estimating a common subspace that captures certain structure shared by all the data matrices. In this talk we investigate the statistical and computational limits for the common subspace model in which one observes a collection of symmetric low-rank matrices perturbed by noise, where each low-rank matrix shares the same common subspace. Our main results identify several regimes of the signal-to-noise ratio (SNR) such that estimation and inference is statistically or computationally optimal, and we refer to these regimes as weak SNR, moderate SNR, strong estimation SNR, and strong inference SNR. Consequently, our results unveil a novel phenomenon: despite the SNR being ``above'' the computational limit for estimation, adaptive statistical inference may still be information-theoretically impossible.
Keywords
Spectral methods
Multilayer networks
Matrix analysis
Random matrix theory