Tuesday, Aug 5: 8:30 AM - 10:20 AM
0850
Topic-Contributed Paper Session
Music City Center
Room: CC-205C
Applied
No
Main Sponsor
Section on Risk Analysis
Co Sponsors
IMS
Presentations
In his 1996 paper, Talagrand highlighted that the Law of Large Numbers (LLN)
for independent random variables can be viewed as a geometric property of
multidimensional product spaces. This phenomenon is known as the concentration
of measure. To illustrate this profound connection between geometry and
probability theory, we consider a seemingly intractable geometric problem in
multidimensional Euclidean space and solve it using standard probabilistic
tools such as the LLN and the Central Limit Theorem (CLT).
Speaker
Haim Bar, University of Connecticut
For a 1-factor model in high dimensions, we describe a shrinkage method that improves the sample estimate of the first principal component of the sample covariance matrix by a quantifiable amount. We prove asymptotic theorems when the dimension p tends to infinity while the number of samples n stays bounded. The improved estimator is shown to significantly improve the estimated minimum variance optimization problem subject to an arbitrary number of linear equality constraints. This is joint work with Lisa Goldberg and Hubeyb Gurdogan.
We describe a version of Stein's paradox for eigenvectors of a sample covariance matrix. It shows, much like Charles Stein did for the sample mean in the 1950s, that in high dimensions, provably better estimators exist. We develop a Stein type estimator for a spiked covariance model that shrinks the spiked eigenvectors to an arbitrary low dimensional subspace. We prove that this estimator has a strictly better mean-squared error in the high dimensional limit, leading to a more accurate low dimensional representation of the data. That this result holds even for a randomly chosen subspace highlights a new paradox in probability and statistics, one that is a consequence of the geometry of high dimensional spaces.
Co-Author
Haim Bar, University of Connecticut
Speaker
Alex Shkolnik, University of California, Santa Barbara
Large covariance estimation plays a central role in high dimensional statistics. We suppose the pairwise correlations among variables are due to a small number of latent factors, which yields a spiked covariance structure where a few large eigenvalues separate from a bulk spectrum. The associated eigenvectors form a population quantity of interest, denoted B, while their empirical counterparts, H, are typically estimated via the leading eigenvectors of the sample covariance matrix. The matrix B^T H (the transpose of B multiplied by H), which captures the alignment between sample and population eigenvectors. This matrix provides a fine grained measure of estimation accuracy. We establish an impossibility theorem showing that for finite sample size, in general, no consistent estimator of B^T H exists as the dimension grows to infinity. This finding has important implications for principal component analysis for high dimensional data with a small sample size, an increasingly relevant setting in data science.