Session 12: Forensic Handwriting Identification using Random Forests and Score-based Likelihood Ratios

Danica Ommen, Iowa State University
Madeline Q. Johnson, Boston Scientific

Abstract

Handwriting analysis is conducted by forensic document examiners who can visually recognize characteristics of writing to assess the writership propositions. Recently, there have been incentives to investigate how to quantify the similarity between two written documents to support the conclusions drawn by experts. To this end, we use an automatic algorithm within the open-source ‘handwriter’ package in R to decompose a handwritten sample into small graphical units of writing. These graphs are sorted into exemplar groups or clusters. We assume that the frequency with which a writer produces graphs to each cluster is characteristic of their handwriting. Then, given handwritten document pairs, we can use the difference in their vectors of cluster frequencies as the input for a random forest. The output from the random forest is used as the similarity score. We estimate the densities of the similarity scores computed from multiple pairs of documents where the source attribution is known and use them to obtain score-based likelihood ratios (SLRs). We find that several different types of SLRs can successfully indicate the strength of evidence for writership determinations.

 
Feb 8th, 2:30 PM Feb 8th, 3:25 PM

Session 12: Forensic Handwriting Identification using Random Forests and Score-based Likelihood Ratios

Herold Crest 253 C

Handwriting analysis is conducted by forensic document examiners who can visually recognize characteristics of writing to assess the writership propositions. Recently, there have been incentives to investigate how to quantify the similarity between two written documents to support the conclusions drawn by experts. To this end, we use an automatic algorithm within the open-source ‘handwriter’ package in R to decompose a handwritten sample into small graphical units of writing. These graphs are sorted into exemplar groups or clusters. We assume that the frequency with which a writer produces graphs to each cluster is characteristic of their handwriting. Then, given handwritten document pairs, we can use the difference in their vectors of cluster frequencies as the input for a random forest. The output from the random forest is used as the similarity score. We estimate the densities of the similarity scores computed from multiple pairs of documents where the source attribution is known and use them to obtain score-based likelihood ratios (SLRs). We find that several different types of SLRs can successfully indicate the strength of evidence for writership determinations.