13 Scalable M-Open Model Selection in Large Data Settings

Bruno Sanso Co-Author
University of California-Santa Cruz
 
Jacob Fontana First Author
 
Jacob Fontana Presenting Author
 
Tuesday, Aug 6: 10:30 AM - 12:20 PM
3154 
Contributed Posters 
Oregon Convention Center 
We consider the variable selection problem for linear models in the M-open setting, where the data generating process is outside the model space. We focus on the novel problem of Model Superinduction, which refers to the tendency of model selection procedures to exponentially favor larger models as the sample size grows, resulting in overparametrized models which induce severe computational difficulties. We prove the existence of this phenomenon for popular classes of model selection priors, such as mixtures of g-priors and the family of spike and slab priors. We further show this behavior is inescapable for any KL-divergence minimizing model selection procedure, so we seek to minimize its effects for large n, while preserving posterior consistency. We propose variants of the aforementioned priors that result in a slowly diminishing rate of prior influence on the posterior, which favors simpler models while preserving consistency. We further propose a model space prior which induces stronger model complexity penalization for large sample sizes. We demonstrate the efficacy of our proposed solutions via synthetic data examples and a case study using albedo data from GOES satellites.

Keywords

Model selection

Bayesian decision theory

M-open model comparison

Linear Models

Spike and Slab prior

g-prior 

Abstracts


Main Sponsor

Section on Bayesian Statistical Science