Score-based Likelihood Ratios Using Stylometric Text Embeddings

Abstract Number:

2680 

Submission Type:

Contributed Abstract 

Contributed Abstract Type:

Poster 

Participants:

Rachel Longjohn (1), Padhraic Smyth (2)

Institutions:

(1) N/A, N/A, (2) University of California, Irvine, N/A

Co-Author:

Padhraic Smyth  
University of California, Irvine

First Author:

Rachel Longjohn  
N/A

Presenting Author:

Rachel Longjohn  
N/A

Abstract Text:

We consider the problem setting in which we have two sets of texts in digital form and would like to quantify our beliefs that the two sets of texts were written by the same author versus by two different authors. Motivated by problems in digital forensics, the sets of texts could be composed primarily of short-form messages, and texts by the same author may be about vastly different topics. To this end, we focus on user-specific stylometric aspects of the texts that are consistent across an author's writings and are invariant to topics. Recent work in machine learning has sought to learn a mapping from input texts to output a vector representation intended to capture such stylometric features. In this work, we investigate the use of such stylometric text embeddings to construct a score-based likelihood ratio (SLR), an increasingly popular way of quantifying evidence in forensics. We present the results of SLR experiments using recently proposed stylometric embeddings from machine learning applied to real-world datasets relevant to digital forensics.

Keywords:

digital forensics|authorship analysis| large language models| machine learning|idiolect|text data

Sponsors:

Section on Text Analysis

Tracks:

Miscellaneous

Can this be considered for alternate subtype?

Yes

Are you interested in volunteering to serve as a session chair?

No

I have read and understand that JSM participants must abide by the Participant Guidelines.

Yes

I understand that JSM participants must register and pay the appropriate registration fee by June 1, 2024. The registration fee is non-refundable.

I understand