Optimal Data Splitting

Roshan Joseph Co-Author
School of ISYE, Georgia Tech
 
Youngseo Cho Speaker
 
Thursday, Aug 7: 8:35 AM - 9:00 AM
Invited Paper Session 
Music City Center 
It is common to split a dataset into a training set and a testing set for building statistical and machine learning models. In this talk, we will discuss about deterministic methods for optimally splitting the dataset. SPlit and Twinning are two such methods where the aim was to split the dataset with similar distributional characteristics. We will propose a new method for creating a testing set that not only maintains the distribution but also difficult to predict.

Keywords

training set

testing set

validation

experimental design