62: Accounting for Systematic Biases in Transportation Data

Linda Boyle Co-Author
University of Washington, Industrial & Systems Engineering
 
Grace Douglas First Author
New York University C2SMARTER Institute
 
Grace Douglas Presenting Author
New York University C2SMARTER Institute
 
Wednesday, Aug 6: 10:30 AM - 12:20 PM
1826 
Contributed Posters 
Music City Center 
Transportation data on real world events can be quite messy. Models trained on these data often exhibit misclassification patterns impacting inferences made. This is a particular issue in safety research where the models are used for crash prediction. This study presents a framework for identifying and analyzing systematic prediction error. Data related to pedestrian-vehicle crashes at intersections in Seattle, Washington is used to distinguish between locations prone to temporally systematic and spatially random prediction biases. The framework identified significant geographic heterogeneity in model performance and temporally consistent error patterns. A manual labeling protocol using Google Street View showed environmental features (e.g., sight-line obstructions, infrastructure conditions) originally absent from the training data. This analysis reduced manual review requirements by identifying spatial and temporal components contributing to systematic biases observed in naturalistic data. The framework can be used in future crash prediction models to establish protocols for systematic pattern detection and new feature extraction.

Keywords

crash modeling

misclassification

machine learning

google street view

validation

framework 

Main Sponsor

Transportation Statistics Interest Group