Effect of Training Data Quality on Classifier Performance
Alan Karr
Presenting Author
Temple University
Sunday, Aug 3: 3:20 PM - 3:35 PM
2253
Contributed Papers
Music City Center
When the quality of training data underlying a classifier is
degraded, multiple effects arise, on the boundary structure of the classifier,
its performance on the training data, and on its performance on validation
data. We illustrate these effects in the context of metagenomic assembly of
short DNA reads arising from one of three genomes, for four classifiers: naive
Bayes classifier, partition model, random forest and neural net.
In particular, which the quality of the training data can be parameterized, we
show the existence of phase transitions where the behavior of the individual
classifiers, as well as the congruence among them, changes dramatically.
Classifier
Training data
Data quality
Phase transition
Main Sponsor
Section on Statistical Computing
You have unsaved changes.