Print Close

Truth or consequences? A principled path to evaluating classifiers using survey data

Presented During: CS027 Practice and Applications, Part 1

Conference: Symposium on Data Science and Statistics (SDSS) 2023

05/26/2023: 10:40 AM - 10:45 AM CDT
Lightning

Description

Surveys are commonly used to facilitate empirical social science research. Due to several constraints, they are often not simple random samples. Therefore, respondents are usually assigned weights indicative of their relative worth in a statistical procedure. It has been proven that using weights produces unbiased estimates of population totals and accurate explanatory models of outcomes. However, predictive modeling, which has become popular in the social sciences, does not traditionally incorporate representative weighting in model development or assessment. This research investigates whether weighted performance measures on survey testing data, used with well-established model development approaches, produce reliable estimates of population performance. We test this using simulated stratified sampling, both under known relationships between predictors and outcomes and with real-world data. We show that unweighted metrics on sample testing data for models subject to default train/test cycles do not represent population performance, but weighted metrics do. We also show that the same holds for models trained using methods directly orthogonal to population representation, such as upsampling for mitigating class imbalance. Our results suggest that regardless of development procedure, weighted metrics should be used when evaluating performance on sample test data.

Keywords

Surveys

Machine learning

Probability samples

Weighting

Presenting Author

Adway Wadekar

First Author

Adway Wadekar

CoAuthor

Jerome Reiter, Duke University

Target Audience

Mid-Level

Tracks

Practice and Applications

Symposium on Data Science and Statistics (SDSS) 2023