Print Close

An impossibility theorem for principal component analysis

Presented During: The Geometry of Estimation in High Dimensions

Hubeyb Gurdogan Co-Author
University of California, Los Angeles

Hubeyb Gurdogan Speaker
University of California, Los Angeles

Tuesday, Aug 5: 9:35 AM - 9:55 AM
Topic-Contributed Paper Session

Music City Center

Large covariance estimation plays a central role in high dimensional statistics. We suppose the pairwise correlations among variables are due to a small number of latent factors, which yields a spiked covariance structure where a few large eigenvalues separate from a bulk spectrum. The associated eigenvectors form a population quantity of interest, denoted B, while their empirical counterparts, H, are typically estimated via the leading eigenvectors of the sample covariance matrix. The matrix B^T H (the transpose of B multiplied by H), which captures the alignment between sample and population eigenvectors. This matrix provides a fine grained measure of estimation accuracy. We establish an impossibility theorem showing that for finite sample size, in general, no consistent estimator of B^T H exists as the dimension grows to infinity. This finding has important implications for principal component analysis for high dimensional data with a small sample size, an increasingly relevant setting in data science.