The Geometry of Estimation in High Dimensions

Rajeshwari Sundaram Chair
National Institute of Child Health and Human Development
 
Alex Shkolnik Organizer
University of California, Santa Barbara
 
Tuesday, Aug 5: 8:30 AM - 10:20 AM
0850 
Topic-Contributed Paper Session 
Music City Center 
Room: CC-205C 

Applied

No

Main Sponsor

Section on Risk Analysis

Co Sponsors

IMS

Presentations

High Dimensional Space Oddity

In his 1996 paper, Talagrand highlighted that the Law of Large Numbers (LLN)
for independent random variables can be viewed as a geometric property of
multidimensional product spaces. This phenomenon is known as the concentration
of measure. To illustrate this profound connection between geometry and
probability theory, we consider a seemingly intractable geometric problem in
multidimensional Euclidean space and solve it using standard probabilistic
tools such as the LLN and the Central Limit Theorem (CLT). 

Speaker

Haim Bar, University of Connecticut

James-Stein shrinkage for high dimensional eigenvectors

For a 1-factor model in high dimensions, we describe a shrinkage method that improves the sample estimate of the first principal component of the sample covariance matrix by a quantifiable amount. We prove asymptotic theorems when the dimension p tends to infinity while the number of samples n stays bounded. The improved estimator is shown to significantly improve the estimated minimum variance optimization problem subject to an arbitrary number of linear equality constraints. This is joint work with Lisa Goldberg and Hubeyb Gurdogan. 

Co-Author(s)

Lisa Goldberg, University of California, Berkeley
Alec Kercheval, Florida State University
Hubeyb Gurdogan, University of California, Los Angeles

Speaker

Alec Kercheval, Florida State University

Stein's Paradox for Eigenvectors of Large Covariance Matrices

We describe a version of Stein's paradox for eigenvectors of a sample covariance matrix. It shows, much like Charles Stein did for the sample mean in the 1950s, that in high dimensions, provably better estimators exist. We develop a Stein type estimator for a spiked covariance model that shrinks the spiked eigenvectors to an arbitrary low dimensional subspace. We prove that this estimator has a strictly better mean-squared error in the high dimensional limit, leading to a more accurate low dimensional representation of the data. That this result holds even for a randomly chosen subspace highlights a new paradox in probability and statistics, one that is a consequence of the geometry of high dimensional spaces. 

Co-Author

Haim Bar, University of Connecticut

Speaker

Alex Shkolnik, University of California, Santa Barbara

An impossibility theorem for principal component analysis

Large covariance estimation plays a central role in high dimensional statistics. We suppose the pairwise correlations among variables are due to a small number of latent factors, which yields a spiked covariance structure where a few large eigenvalues separate from a bulk spectrum. The associated eigenvectors form a population quantity of interest, denoted B, while their empirical counterparts, H, are typically estimated via the leading eigenvectors of the sample covariance matrix. The matrix B^T H (the transpose of B multiplied by H), which captures the alignment between sample and population eigenvectors. This matrix provides a fine grained measure of estimation accuracy. We establish an impossibility theorem showing that for finite sample size, in general, no consistent estimator of B^T H exists as the dimension grows to infinity. This finding has important implications for principal component analysis for high dimensional data with a small sample size, an increasingly relevant setting in data science. 

Co-Author

Hubeyb Gurdogan, University of California, Los Angeles

Speaker

Hubeyb Gurdogan, University of California, Los Angeles