Development of a Probabilistic Multi-Class Model Selection Algorithm for High-Dimensional and Complex Data
Dissertation - Open Access
Doctor of Philosophy (PhD)
Department / School
Mathematics and Statistics
The development of quantifiable measures of uncertainty in forensic conclusions has resulted in the debut of several ad-hoc methods for approximating the weight of evidence (WoE). In particular, forensic researchers have attempted to use similarity measures, or scores, to approximate the weight of evidence characterized by highdimensional and complex data. Score-based methods have been proposed to approximate theWoE for numerous evidence types (e.g., fingerprints, handwriting, inks, voice analysis). In general, scorebased methods consider the score as a projection onto the real line. For example, the score-based likelihood ratio evaluates and compares the likelihoods of a score calculated between two objects in two density functions, based on sampling distributions of the score under two mutually exclusive propositions. Other score-based methods have been proposed [6, 7, 31, 82], which do not rely on such a ratio. This dissertation focuses on a class of kernel-based algorithms that fall in the latter group of score-based methods, and introduces a model that serves to complete the class of kernel-based algorithms initiated under NIJ Awards 2009-DN-BX-K234 and 2015-R2-CX-0028, which addressed the “outlier detection” and “common source” problems, by proposing a fully probabilistic model for addressing the “specific source” problem. This “specific source” problem is addressed in three progressive models: first, the problem is addressed for a pair of fixed sources; next, the two-class model is extended to consider multiple fixed sources; finally, a kernel-based model selection algorithm is developed to consider a single fixed source juxtaposed with multiple random sources. This class of algorithms relates pairs of high-dimensional, complex objects through a kernel function to obtain a vector of within-source and between-source scores, and capitalizes on the variability that exists within and between these sets of scores. The model makes no assumptions about the type or dimension of data to which it can be applied, and can be tailored to any type of data by modifying the kernel function at the core of the model. In addition, this algorithm provides a naturally probabilistic, multi-class, and compact alternative to current kernel-based pattern recognition methods such as support vector machines, relevance vector machines, and approximate Bayesian computation methods.
Library of Congress Subject Headings
Forensic sciences -- Data processing.
Number of Pages
South Dakota State University
Ausdemore, Madeline Anne, "Development of a Probabilistic Multi-Class Model Selection Algorithm for High-Dimensional and Complex Data" (2021). Electronic Theses and Dissertations. 5207.