When cross-validation meets Cook's distance
Wednesday, Aug 7: 11:50 AM - 12:05 PM
3674
Contributed Papers
Oregon Convention Center
We introduce a new feature selection method for regression models based on cross-validation (CV) and Cook's distance (CD). Leave-one-out (LOO) CV measures the difference of the LOO fitted values from the observed responses while CD measures their difference from the full data fitted values. CV selects a model based on its prediction accuracy and tends to select overfitting models often. To improve CV, we take into account model robustness using CD, which can be shown to be effective in differentiating overfitting models. Hence we propose a linear combination of CV error and the average Cook's distance as a feature selection criterion. Under mild assumptions, we show that the probability of this criterion selecting the true model in linear regression using the least squares method converges to 1, which is not the case for CV. Our simulation studies also demonstrate that this criterion yields significantly better performance in feature selection for both linear regression and penalized linear regression compared to CV. As for computational efficiency, this criterion requires no extra calculation compared to CV as CD involves the same fitted values needed for CV.
Cook's distance
Cross-validation
Linear regression
Model robustness
Model selection
Main Sponsor
Section on Statistical Learning and Data Science
You have unsaved changes.