Inference for high dimensional data with Partially Missing Returns
Likai Chen
Speaker
Washington University in St Louis
Monday, Aug 3: 9:35 AM - 9:55 AM
Invited Paper Session
Thomas M. Menino Convention & Exhibition Center
We propose a new methodology for valid coefficient inference in financial factor models when the response panel of returns is partially missing. The leading application is alpha inference in a Fama-French factor regression, where a fund's or stock's risk-adjusted return $\alpha_i$ is the parameter of interest but returns are missing due to fund closure, delisting, sparse trading, or asynchronous markets. The framework combines a static factor backbone (estimated via PCA on observed returns) with an optional diffusion-based residual correction to construct a synthetic prediction $S_{i,t+1}$ for every unit, observed or missing. The prediction is never substituted for the response; instead, it is residualized against the factor-model design $W_{it}$ on the full panel and enters a labeled-sample regression as an orthogonalized auxiliary control. We establish unbiasedness and asymptotic normality for $\hat{\bfbeta}$ (and therefore $\hat{\alpha}_i$), derive the efficiency gain $\mathrm{ARE} = 1/(1 - \pi_T \rho^2)$ where $\pi_T$ is the missing fraction and $\rho$ is the partial correlation between target and surrogate, and show through Monte Carlo simulation that the gain is realized only when the surrogate uses information beyond $W$. The framework offers valid alpha inference in settings where matrix completion \citep{bryzgalova2024missing} delivers prediction quality without inferential validity, complementing the cross-sectional bootstrap methods of \citet{kosowski2006can} and \citet{fama2010luck}.
You have unsaved changes.