Unstructured Data in Policing: Prospects and Challenges

Tarak Shah Speaker
Human Rights Data Analysis Group
 
Wednesday, Aug 7: 9:50 AM - 10:15 AM
Invited Paper Session 
Oregon Convention Center 
Key information about police incidents is often found not in standardized, structured databases, but in unstructured sources, especially in written or typed documents, reports, transcripts, and administrative forms. At the recent Ingram Olkin Forum on "Statistical Challenges in the Analysis of Police Use of Force", researchers and practitioners explored challenges of and approaches to incorporating these data sources into analyses. We look at the sources of unstructured data collections along with examples of public records requests for document collections. We then review scalable approaches to file management for large collections, including duplicate detection. From there we highlight key technical tasks when processing unstructured collections and review existing tools and approaches to them. We finish by discussing how to evaluate the quality of structured data extracted from unstructured collections.