# Probabilistic Foundations for the Use of the Logistic Regression Bayes Factor in Forensic Source Identification

## Presentation Type

Poster

## Student

Yes

## Track

Forensic Statistics

## Abstract

In comparison to likelihood ratios (LRs), Bayes factors (BFs) have the advantage that uncertainty in model parameter values is taken into account in a logical and coherent manner. In forensic literature, it is common to calculate BFs for generative models. It is also common to calculate LRs for discriminative models, for example using maximum likelihood (ML) estimates of logistic regression parameters. In this report, we present an approach to calculate BFs when using logistic regression as a model to discriminate between two classes. In logistic regression, the log of the LR between the two classes follows a functional form. We will focus on the case where this functional form is linear. This is equivalent to the log of the posterior odds of group membership following a linear model. We propose the calculation of the BF utilizing the posterior odds ratio, as well as using the LR function in the context of Ommen and Saunders, 2021. Using a database of simulated observations generated under two different models, we can obtain a posterior distribution for the parameters of the logistic regression, and use this distribution to obtain the posterior odds of group membership for a new observation with unknown membership. This posterior odds ratio can then be divided by the prior odds ratio to obtain the corresponding BF. An important note is that by constructing the database with a prespecified number of observations under each model, we are fixing the base rates. This removes the Bernoulli sampling process of the labels used to construct the likelihood function for the logistic regression, which will be discussed in the context of McLachlan, 2004. As a result, our discriminative model is an approximation to the latent generative models of the two classes. We study the convergence of the BF to the LR for two different BF calculations, and show that for large sample sizes they both converge. Also, we compare the calculated BFs of the two approaches to a reference BF, LR, and the plug-in estimate of the LR.

## Start Date

2-6-2024 1:00 PM

## End Date

2-6-2024 2:00 PM

Probabilistic Foundations for the Use of the Logistic Regression Bayes Factor in Forensic Source Identification

Volstorff A

In comparison to likelihood ratios (LRs), Bayes factors (BFs) have the advantage that uncertainty in model parameter values is taken into account in a logical and coherent manner. In forensic literature, it is common to calculate BFs for generative models. It is also common to calculate LRs for discriminative models, for example using maximum likelihood (ML) estimates of logistic regression parameters. In this report, we present an approach to calculate BFs when using logistic regression as a model to discriminate between two classes. In logistic regression, the log of the LR between the two classes follows a functional form. We will focus on the case where this functional form is linear. This is equivalent to the log of the posterior odds of group membership following a linear model. We propose the calculation of the BF utilizing the posterior odds ratio, as well as using the LR function in the context of Ommen and Saunders, 2021. Using a database of simulated observations generated under two different models, we can obtain a posterior distribution for the parameters of the logistic regression, and use this distribution to obtain the posterior odds of group membership for a new observation with unknown membership. This posterior odds ratio can then be divided by the prior odds ratio to obtain the corresponding BF. An important note is that by constructing the database with a prespecified number of observations under each model, we are fixing the base rates. This removes the Bernoulli sampling process of the labels used to construct the likelihood function for the logistic regression, which will be discussed in the context of McLachlan, 2004. As a result, our discriminative model is an approximation to the latent generative models of the two classes. We study the convergence of the BF to the LR for two different BF calculations, and show that for large sample sizes they both converge. Also, we compare the calculated BFs of the two approaches to a reference BF, LR, and the plug-in estimate of the LR.