Selection bias in big data in official statistics from a practitioner’s point of view
Tuesday, Aug 6: 9:35 AM - 9:50 AM
3399
Contributed Papers
Oregon Convention Center
Which is better: simple random sample (SRS) with nonresponse or a large dataset with selection bias?
Selection bias is increasingly problematic in surveys. Inspired by Meng's (2018) paper Statistical paradises and paradoxes in big data, where he highlights statistical issues that bigness of data sets incur, we simulated sequences of growing populations with two different data collection methods: simple random sample (SRS) with nonresponse and organic data with selection bias ("big data"). The results showed a trade-off between bias and coverage probability caused by the amount of data available. Users of statistics often focus on good point estimates. Then a large nonprobability data source may be better than an SRS with nonresponse. On the other hand, if the user wants a reliable confidence interval, a probability sample with missingness may be preferred. Tools for comparing different data sources were investigated and discussed from a practical point of view.
Selection bias
Simulation study
Bias-variance tradeoff
Non-response bias
Main Sponsor
Survey Research Methods Section
You have unsaved changes.