Session 6 : Specific Source Machine Learning Score-based Likelihood Ratios for Forensic Evidence
Presentation Type
Invited
Student
No
Track
Forensic Statistics
Abstract
Source identification is an inferential problem that evaluates the likelihood of opposing propositions regarding the origin of items. The specific source problem refers to a type of source identification where the researcher aims to assess if a particular source generated the items or if they were generated from an alternative, unknown source. Score-based likelihood ratios offer a method to assess the relative likelihood of both propositions when formulating a probabilistic model is challenging or infeasible, as in pattern evidence in forensic science. To address the specific source question with a likelihood ratio requires a conditional inference, but data for the specific source (e.g. control items related to the person of interest) is often scarce, making this approach practically infeasible. Furthermore, the dependence structure created by the current procedure for generating data for machine learning algorithms can lead to reduced performance of such score-based likelihood ratio systems. To address this, we propose a resampling plan that creates synthetic items to generate learning instances for the specific source problem. Simulation results show that our approach achieves a high level of agreement with an ideal scenario where data is not a limitation and where the data are independent. We also present two applications in forensic sciences - handwriting and glass analysis - illustrating our approach with both a distance-based and a machine learning-based score. These applications show that our method may outperform current alternatives in the literature, effectively creating a feasible specific source approach for forensic casework.
Start Date
2-7-2025 11:00 AM
End Date
2-7-2025 12:00 PM
Session 6 : Specific Source Machine Learning Score-based Likelihood Ratios for Forensic Evidence
Pasque (Room 255)
Source identification is an inferential problem that evaluates the likelihood of opposing propositions regarding the origin of items. The specific source problem refers to a type of source identification where the researcher aims to assess if a particular source generated the items or if they were generated from an alternative, unknown source. Score-based likelihood ratios offer a method to assess the relative likelihood of both propositions when formulating a probabilistic model is challenging or infeasible, as in pattern evidence in forensic science. To address the specific source question with a likelihood ratio requires a conditional inference, but data for the specific source (e.g. control items related to the person of interest) is often scarce, making this approach practically infeasible. Furthermore, the dependence structure created by the current procedure for generating data for machine learning algorithms can lead to reduced performance of such score-based likelihood ratio systems. To address this, we propose a resampling plan that creates synthetic items to generate learning instances for the specific source problem. Simulation results show that our approach achieves a high level of agreement with an ideal scenario where data is not a limitation and where the data are independent. We also present two applications in forensic sciences - handwriting and glass analysis - illustrating our approach with both a distance-based and a machine learning-based score. These applications show that our method may outperform current alternatives in the literature, effectively creating a feasible specific source approach for forensic casework.