An impossibility theorem for principal component analysis

Hubeyb Gurdogan Co-Author
University of California, Los Angeles
 
Hubeyb Gurdogan Speaker
University of California, Los Angeles
 
Tuesday, Aug 5: 9:35 AM - 9:55 AM
Topic-Contributed Paper Session 
Music City Center 
Large covariance estimation plays a central role in high dimensional statistics. We suppose the pairwise correlations among variables are due to a small number of latent factors, which yields a spiked covariance structure where a few large eigenvalues separate from a bulk spectrum. The associated eigenvectors form a population quantity of interest, denoted B, while their empirical counterparts, H, are typically estimated via the leading eigenvectors of the sample covariance matrix. The matrix B^T H (the transpose of B multiplied by H), which captures the alignment between sample and population eigenvectors. This matrix provides a fine grained measure of estimation accuracy. We establish an impossibility theorem showing that for finite sample size, in general, no consistent estimator of B^T H exists as the dimension grows to infinity. This finding has important implications for principal component analysis for high dimensional data with a small sample size, an increasingly relevant setting in data science.