Bias Correction in Machine Learning-based Classification of Rare Events

Conference: Symposium on Data Science and Statistics (SDSS) 2024
06/06/2024: 1:15 PM - 2:45 PM EDT
Refereed 

Description

Online platform businesses can be identified by using web-scraped texts. This is a classification problem that combines elements of natural language processing and rare event detection. Because online platforms are rare, accurately identifying them with Machine Learning algorithms is challenging. Here, we describe the development of a Machine Learning-based text classification approach that reduces the number of false positives as much as possible. It greatly reduces the bias in the estimates obtained by using calibrated probabilities and ensembles.

Keywords

Calibration

Population

Ensembles 

Presenting Author

Piet Daas, Statistics Netherlands & EIndhoven University of Technology

First Author

Luuk Gubbels, Eindhoven University of Technology

CoAuthor(s)

Marco Puts, Statistics Netherlands
Piet Daas, Statistics Netherlands & EIndhoven University of Technology

Tracks

Statistical Data Science
Symposium on Data Science and Statistics (SDSS) 2024