Monday, Aug 4: 10:30 AM - 12:20 PM
0642
Topic-Contributed Paper Session
Music City Center
Room: CC-212
Applied
Yes
Main Sponsor
Forensic Statistics Interest Group
Co Sponsors
Advisory Committee on Forensic Science
Committee on Law and Justice Statistics
Presentations
Multiple reviews, including those by the National Academy of Sciences (2009), PCAST (2016), and AAAS (2017), have concluded that forensic latent fingerprint comparison lacks empirical validation. Scientific validity requires rigorously designed studies of examiner performance: accuracy, repeatability, and reproducibility. We performed a systematic review of black-box studies evaluating latent fingerprint comparisons and found that all suffer from fundamental design and statistical flaws. These flaws (including inadequate sample sizes, non-representative samples and test conditions, improper handling of inconclusives, and flawed error rate estimation) render the studies invalid for establishing the field's scientific validity. Furthermore, these studies omit key elements of real casework, such as database searches (AFIS), contextual bias, and real-world complexity. As a result, error rates of latent fingerprint examiners remain unknown, and claims of reliability and accuracy lack scientific support. We offer recommendations for future studies to ensure valid experimental design, statistical analysis, and real-world relevance, advancing the field toward scientific rigor and admissibility.
When wires are cut, the tool produces striations on the cut surface; as in other forms of forensic analysis, these striation marks are used to connect the evidence to the source that created them. Here, we argue that the practice of comparing two wire cut surfaces introduces complexities not present in better-investigated forensic examination of toolmarks such as those observed on bullets, as wire comparisons inherently require multiple distinct comparisons, increasing the expected false discovery rate. We call attention to the multiple comparison problem in wire examination and relate it to other situations in forensics that involve multiple comparisons, such as database searches.
Forensic shoeprint examiners have been criticized for relying on subjective rather than objective, quantitative methods to determine whether a suspect's shoe sole matches a crime scene impression. After confirming a match in pattern, size, and wear, experts assess whether randomly acquired characteristics (RACs) correspond. A correspondence of rare RACs supports the hypothesis that the two impressions originate from the same source.
This study, conducted with the Israel Police and the Center for Statistics and Applications in Forensic Evidence, examines RAC visibility in crime scene-like conditions rather than ideal lab settings. Controlled experiments with pre-selected shoes featuring numerous RACs simulated thefts, leaving impressions on various surfaces. A total of 302 shoeprints from 30 simulated crime scenes were analyzed, identifying 488 RACs.
The study assesses how RAC visibility varies by print quality and surface type and develops a probabilistic model estimating the likelihood of RACs appearing in specific locations. The collected data serve as a basis for black-box studies on examiner decision-making and for improving the reliability of RAC analysis in forensic practice.
In most forensic science disciplines, there is no objective way to determine whether two fingerprints, bullets, or handwritten documents come from the same person or not. Instead, it is the responsibility of individual analysts to make subjective conclusions and communicate their results to a judge or jury. Some examiners appear to be well-calibrated with each other and utilize the full range of possible outcomes, while other examiners display a tendency to overuse some categories and underuse others. Recently, there have been proposals to shift from a 3-category scale (e.g., "same source", "different source", and "inconclusive") to 5 or 7 category scales (e.g., "strong support for same source", "moderate support for same source", etc.). Using Item Response Theory-based tools from standardized testing, we quantify these differences and illustrate potential implications of different reporting scales using simulation. Since examiner conclusions can influence investigator, judge, and jury decisions, it is important to measure and understand the range of individual differences in reporting styles before adopting a more complicated scale.
The specific source problem refers to a type of inference in forensic science where the aim is to assess if a particular source generated the evidence or if it was generated from an alternative, unknown source. Score-based likelihood ratios (SLR) quantify the relative likelihood of the evidence under both propositions for complex features. This analysis requires a conditional inference, but data for the specific source (e.g. control items related to the person of interest) is often scarce, making this approach practically infeasible. Furthermore, the dependence structure created by the current procedure for generating data for machine learning algorithms can lead to reduced performance of such SLR systems. To address this, we propose creating synthetic items to train machine learning algorithms for the specific source problem. Simulation results show that our approach achieves a high level of agreement with an ideal scenario where data is not a limitation and where the data are independent. We also present real-world applications in forensic sciences.
Keywords
forensics
random forest
SMOTE
resampling
data augmentation
handwriting