Inside Out: Externalizing Assumptions in Data Analysis as Validation Checks
Sherry Zhang
Presenting Author
The University of Texas at Austin
Tuesday, Aug 5: 9:50 AM - 10:05 AM
1372
Contributed Papers
Music City Center
In data analysis, unexpected results often prompt researchers to revisit their proce- dures to identify potential issues. While some researchers may struggle to identify the root causes, experienced researchers can often quickly diagnose problems by checking a few key assumptions. These checked assumptions, or expectations, are typically informal, diļ¬icult to trace, and rarely discussed in publications. In this paper, we introduce the term analysis validation checks to formalize and externalize these informal assumptions. We then introduce a procedure to identify a subset of checks that best predict the occurrence of unexpected outcomes, based on simula- tions of the original data. The checks are evaluated in terms of accuracy, determined by binary classification metrics, and independence, which measures the shared in- formation among checks. We demonstrate this approach with a toy example using step count data and a generalized linear model example examining the effect of particulate matter air pollution on daily mortality.
data analysis
data validation
diagnostics
Main Sponsor
Section on Statistical Computing
You have unsaved changes.