Data-Driven Justice: Transforming Forensic Science with Statistics, AI, and Data Science

Michael Rosenblum Chair
Johns Hopkins University, Bloomberg School of Public Health
 
Hal Stern Discussant
University of California-Irvine
 
Maria Cuellar Organizer
University of Pennsylvania
 
Monday, Aug 4: 10:30 AM - 12:20 PM
0642 
Topic-Contributed Paper Session 
Music City Center 
Room: CC-212 

Applied

Yes

Main Sponsor

Forensic Statistics Interest Group

Co Sponsors

Advisory Committee on Forensic Science
Committee on Law and Justice Statistics

Presentations

Evaluating the Scientific Foundations of Latent Fingerprint Comparisons

Multiple reviews, including those by the National Academy of Sciences (2009), PCAST (2016), and AAAS (2017), have concluded that forensic latent fingerprint comparison lacks empirical validation. Scientific validity requires rigorously designed studies of examiner performance: accuracy, repeatability, and reproducibility. We performed a systematic review of black-box studies evaluating latent fingerprint comparisons and found that all suffer from fundamental design and statistical flaws. These flaws (including inadequate sample sizes, non-representative samples and test conditions, improper handling of inconclusives, and flawed error rate estimation) render the studies invalid for establishing the field's scientific validity. Furthermore, these studies omit key elements of real casework, such as database searches (AFIS), contextual bias, and real-world complexity. As a result, error rates of latent fingerprint examiners remain unknown, and claims of reliability and accuracy lack scientific support. We offer recommendations for future studies to ensure valid experimental design, statistical analysis, and real-world relevance, advancing the field toward scientific rigor and admissibility. 

Co-Author(s)

Michael Rosenblum, Johns Hopkins University, Bloomberg School of Public Health
Amanda Luby, Carleton College
Maria Cuellar, University of Pennsylvania

Speaker

Michael Rosenblum, Johns Hopkins University, Bloomberg School of Public Health

Hidden Multiple Comparisons Increase Forensic Error Rates

When wires are cut, the tool produces striations on the cut surface; as in other forms of forensic analysis, these striation marks are used to connect the evidence to the source that created them. Here, we argue that the practice of comparing two wire cut surfaces introduces complexities not present in better-investigated forensic examination of toolmarks such as those observed on bullets, as wire comparisons inherently require multiple distinct comparisons, increasing the expected false discovery rate. We call attention to the multiple comparison problem in wire examination and relate it to other situations in forensics that involve multiple comparisons, such as database searches.  

Speaker

Susan Vanderplas, University of Nebraska -- Lincoln

Likelihood of Randomly Acquired Characteristics (RACs) in Crime Scene Shoeprints

Forensic shoeprint examiners have been criticized for relying on subjective rather than objective, quantitative methods to determine whether a suspect's shoe sole matches a crime scene impression. After confirming a match in pattern, size, and wear, experts assess whether randomly acquired characteristics (RACs) correspond. A correspondence of rare RACs supports the hypothesis that the two impressions originate from the same source.
This study, conducted with the Israel Police and the Center for Statistics and Applications in Forensic Evidence, examines RAC visibility in crime scene-like conditions rather than ideal lab settings. Controlled experiments with pre-selected shoes featuring numerous RACs simulated thefts, leaving impressions on various surfaces. A total of 302 shoeprints from 30 simulated crime scenes were analyzed, identifying 488 RACs.
The study assesses how RAC visibility varies by print quality and surface type and develops a probabilistic model estimating the likelihood of RACs appearing in specific locations. The collected data serve as a basis for black-box studies on examiner decision-making and for improving the reliability of RAC analysis in forensic practice.
 

Speaker

Naomi Kaplan-Damary, The Hebrew University of Jerusalem

Reporting scales for forensic source comparisons

In most forensic science disciplines, there is no objective way to determine whether two fingerprints, bullets, or handwritten documents come from the same person or not. Instead, it is the responsibility of individual analysts to make subjective conclusions and communicate their results to a judge or jury. Some examiners appear to be well-calibrated with each other and utilize the full range of possible outcomes, while other examiners display a tendency to overuse some categories and underuse others.  Recently, there have been proposals to shift from a 3-category scale (e.g., "same source", "different source", and "inconclusive") to 5 or 7 category scales (e.g., "strong support for same source", "moderate support for same source", etc.). Using Item Response Theory-based tools from standardized testing, we quantify these differences and illustrate potential implications of different reporting scales using simulation. Since examiner conclusions can influence investigator, judge, and jury decisions, it is important to measure and understand the range of individual differences in reporting styles before adopting a more complicated scale. 

Speaker

Amanda Luby, Carleton College

Specific Source Machine Learning Score-based Likelihood Ratios for Forensic Evidence

The specific source problem refers to a type of inference in forensic science where the aim is to assess if a particular source generated the evidence or if it was generated from an alternative, unknown source. Score-based likelihood ratios (SLR) quantify the relative likelihood of the evidence under both propositions for complex features. This analysis requires a conditional inference, but data for the specific source (e.g. control items related to the person of interest) is often scarce, making this approach practically infeasible. Furthermore, the dependence structure created by the current procedure for generating data for machine learning algorithms can lead to reduced performance of such SLR systems. To address this, we propose creating synthetic items to train machine learning algorithms for the specific source problem. Simulation results show that our approach achieves a high level of agreement with an ideal scenario where data is not a limitation and where the data are independent. We also present real-world applications in forensic sciences. 

Keywords

forensics

random forest

SMOTE

resampling

data augmentation

handwriting 

Co-Author

Federico Veneri, Iowa State University

Speaker

Danica Ommen, Iowa State University