Effect of Training Data Quality on Classifier Performance

Jeanne Ruane Co-Author
University of Pennsylvania
 
Alan Karr First Author
Temple University
 
Alan Karr Presenting Author
Temple University
 
Sunday, Aug 3: 3:20 PM - 3:35 PM
2253 
Contributed Papers 
Music City Center 
When the quality of training data underlying a classifier is
degraded, multiple effects arise, on the boundary structure of the classifier,
its performance on the training data, and on its performance on validation
data. We illustrate these effects in the context of metagenomic assembly of
short DNA reads arising from one of three genomes, for four classifiers: naive
Bayes classifier, partition model, random forest and neural net.

In particular, which the quality of the training data can be parameterized, we
show the existence of phase transitions where the behavior of the individual
classifiers, as well as the congruence among them, changes dramatically.

Keywords

Classifier

Training data

Data quality

Phase transition 

Main Sponsor

Section on Statistical Computing