43: PRITS framework for investigating and assessing web-scraped datasets for research applications
Monday, Aug 4: 2:00 PM - 3:50 PM
2080
Contributed Posters
Music City Center
The PRITS framework addresses the lack of integrated technical and statistical guidance on the programmatic collection of data from online data sources and assessing existing web-scraped datasets for specific research uses. The framework covers five stages: Planning, Retrieval, Investigation, Transformation and Summary (PRITS). The 'Planning' stage focuses on problem and context definition, and sampling design. 'Retrieval' involves the technical execution and automated documentation of web-scraping processes and outputs (i.e. paradata and substantive data). 'Investigation' assesses the content and completeness of the retrieved web response objects. 'Transformation' involves parsing and cleaning the retrieved web data, potential integration with other data, and documentation of key decisions such as imputation or harmonisation strategies. Finally, the 'Summary' stage documents any decisions that might materially impact downstream analysis, and describes key properties (i.e. metadata) and limitations of the final web-scraped dataset.
internet data
sampling design
web scraping
data quality
Main Sponsor
Survey Research Methods Section
You have unsaved changes.