Advances in Exact Subsampling Methods with Linear Regression Models
Sunday, Aug 3: 2:05 PM - 2:20 PM
2045
Contributed Papers
Music City Center
With the dramatic rise of automatic data collection, a huge volume of data is recorded on a daily basis. Despite the potential of big data, it is computationally expensive to fit traditional regression models to datasets with billions of rows. This motivates the use of Optimal Design Based (ODB) subsampling, which identifies a subset that maximizes an optimality criterion typically used in experimental design. Existing methods, such as Information-Based Optimal Subdata Selection (IBOSS), focus on the D-optimality criterion, which minimizes the generalized variance of the parameter estimates. While this is helpful for parameter estimation, little attention has been given to criteria that favor model prediction, such as the I-optimality criterion. In this paper, we propose new algorithms that identify I-optimal subsamples from massive datasets. These algorithms lead to computationally efficient and reliable prediction for linear regression models. The algorithms are extended to the case where there is heteroscedasticity in the errors. Case studies illustrate that the proposed methods have smaller prediction error than existing methods.
Experiment Design
Big Data
Subsampling
I-optimality
Main Sponsor
Section on Statistical Computing
You have unsaved changes.