PRITS Framework for Investigating and Assessing Web-scraped Datasets for Research Applications
Abstract Number:
2080
Submission Type:
Contributed Abstract
Contributed Abstract Type:
Poster
Participants:
Cynthia Huang (1), Tina Lam (2), Mitchell O'Hara-Wild (2)
Institutions:
(1) N/A, N/A, (2) Monash University, N/A
Co-Author(s):
First Author:
Presenting Author:
Abstract Text:
The PRITS framework addresses the lack of integrated technical and statistical guidance on the programmatic collection of data from online data sources and assessing existing web-scraped datasets for specific research uses. The framework covers five stages: Planning, Retrieval, Investigation, Transformation and Summary (PRITS). The 'Planning' stage focuses on problem and context definition, and sampling design. 'Retrieval' involves the technical execution and automated documentation of web-scraping processes and outputs (i.e. paradata and substantive data). 'Investigation' assesses the content and completeness of the retrieved web response objects. 'Transformation' involves parsing and cleaning the retrieved web data, potential integration with other data, and documentation of key decisions such as imputation or harmonisation strategies. Finally, the 'Summary' stage documents any decisions that might materially impact downstream analysis, and describes key properties (i.e. metadata) and limitations of the final web-scraped dataset.
Keywords:
internet data|sampling design|web scraping|data quality| |
Sponsors:
Survey Research Methods Section
Tracks:
Non-probability Samples
Can this be considered for alternate subtype?
Yes
Are you interested in volunteering to serve as a session chair?
Yes
I have read and understand that JSM participants must abide by the Participant Guidelines.
Yes
I understand that JSM participants must register and pay the appropriate registration fee by June 3, 2025. The registration fee is non-refundable.
I understand
You have unsaved changes.