Selection bias in big data in official statistics from a practitioner’s point of view
Abstract Number:
3399
Submission Type:
Contributed Abstract
Contributed Abstract Type:
Paper
Participants:
Martin Hyllienmark (1), Dan Hedlin (1), Edgar Bueno (1)
Institutions:
(1) Stockholm University, Stockholm, Sweden
Co-Author(s):
First Author:
Presenting Author:
Abstract Text:
Which is better: simple random sample (SRS) with nonresponse or a large dataset with selection bias?
Selection bias is increasingly problematic in surveys. Inspired by Meng's (2018) paper Statistical paradises and paradoxes in big data, where he highlights statistical issues that bigness of data sets incur, we simulated sequences of growing populations with two different data collection methods: simple random sample (SRS) with nonresponse and organic data with selection bias ("big data"). The results showed a trade-off between bias and coverage probability caused by the amount of data available. Users of statistics often focus on good point estimates. Then a large nonprobability data source may be better than an SRS with nonresponse. On the other hand, if the user wants a reliable confidence interval, a probability sample with missingness may be preferred. Tools for comparing different data sources were investigated and discussed from a practical point of view.
Keywords:
Selection bias|Simulation study|Bias-variance tradeoff|Non-response bias| |
Sponsors:
Survey Research Methods Section
Tracks:
Non-probability Samples
Can this be considered for alternate subtype?
Yes
Are you interested in volunteering to serve as a session chair?
No
I have read and understand that JSM participants must abide by the Participant Guidelines.
Yes
I understand that JSM participants must register and pay the appropriate registration fee by June 1, 2024. The registration fee is non-refundable.
I understand
You have unsaved changes.