10/07/2022: 11:45 AM - 1:15 PM CDT
Concurrent
Room: Grand Ballroom Salon E
Chair
Parul Agarwal
Presentations
Criminal justice research can often require conversion of open-ended, free-text offense descriptions into overall charge categories to aid analysis. For example, the free-text offense of "eluding a police vehicle" would be coded to a charge category of "Obstruction - Law Enforcement". Since free-text offense descriptions aren't standardized and often need to be categorized in large volumes, this can result in a manual and time intensive process for researchers. Using publicly available national data to train a machine learning model, we present a web application allowing for the bulk conversion of offense text stored in common formats (e.g., XLSX, CSV) into offense categories used in criminal justice. This results in the reduction of an hours-long coding task to minutes with an overall accuracy of 93%.
Presenting Author
Anna Godwin, RTI International
First Author
Anna Godwin, RTI International
CoAuthor
Emily Hadley, RTI International
Point process models rely on the availability of point-level data, or the precise location (ex: latitude/longitude coordinates) associated with each observed event. Uncertainty in point-level data sets is introduced for many reasons such as privacy-preserving methods, geocoding algorithms, and data-gathering mechanisms. Privacy-preserving methods, such as radial perturbation, purposefully move points to allow for protection of the original location. Geocoding, the process of transforming addresses into coordinates, often introduces uncertainty into the geocoded point due to technological limitations. Datasets collected from news articles allow for novel analyses of challenging problems but also can lead to less precise point locations of events. We analyze the impact of uncertainty in point locations and propose measures that analysts can take to address this uncertainty. We focus our discussion on jittered crime data in the city of Cincinnati and simulated cases.
Presenting Author
Claire Kelling, Carleton College
First Author
Claire Kelling, Carleton College
CoAuthor(s)
Murali Haran, Penn State University
Aleksandra Slavkovic, Pennsylvania State University
Despite widespread concern about homelessness, fundamental questions about the size and characteristics of this hard to study population are unresolved, in large part because it is unclear whether existing data are sufficiently complete and reliable. We examine these questions as well as the coverage of new microdata sources that are designed to be nationally representative that will allow ground-breaking new analyses. We compare three largely unused, restricted use data sources to the less detailed public use data. In doing this triangulation of sources, we examine the completeness and accuracy of available data and improve our understanding of the size of the homeless population and its inclusion in widely used household surveys. Specifically, we compare restricted data from the 2010 Census American Community Survey (ACS), and Homeless Management Information System (HMIS) to HUD's public-use point-in-time (PIT) estimates and the Housing Inventory Count (HIC) at the national, city and county, and person level. We explore the extent to which definitional differences, weighting methodology, frame completeness, and seasonality explain discrepancies between sources. We also link HMIS shelter use data to the Census to evaluate the usefulness of these microdata to study the homeless population. Our analyses suggest that on any given night there are 500,000-600,000 people experiencing homelessness in the U.S., about one-third of whom are sleeping on the streets and two-thirds in homeless shelters, about 80-95 percent of whom were counted in the Census. Despite employing substantially different methods, the Census, ACS, and PIT arrive at similar estimates after accounting for definitional differences, ambiguity in the classification of certain facilities, and differences arising from the timeframe of Census response. The coverage of these sources is surprisingly good given the difficulties of surveying this population. By establishing the broad coverage and reliability of the new data sources, this paper lays the foundation for groundbreaking future work on the characteristics, income, safety net participation, mortality, migration, geographic distribution, and housing status transitions of the U.S. homeless population.
Presenting Author
Angela Wyse, The University of Chicago
First Author
Angela Wyse, The University of Chicago
CoAuthor(s)
Kevin Corinth, University of Chicago
Bruce Meyer, The University of Chicago