Print Close

Statistical Foundations of Large Language Models: From Theory to Practice

Weijie Su Chair
University of Pennsylvania

Weijie Su Organizer
University of Pennsylvania

Tuesday, Aug 5: 2:00 PM - 3:50 PM
0440
Invited Paper Session

Music City Center

Room: CC-101A

Keywords

Large language models

Foundation models

Artificial intelligence

Statistical foundations

Generative AI

ChatGPT

Applied

Main Sponsor

IMS

Co Sponsors

Section on Nonparametric Statistics

Section on Statistical Learning and Data Science

Presentations

Large language model validity via enhanced conformal prediction methods

We develop new conformal inference methods for obtaining validity guarantees on the output of large language models (LLMs). Prior work in conformal language modeling identifies a subset of the text that satisfies a high-probability guarantee of correctness. These methods work by filtering claims from the LLM's original response if a scoring function evaluated on the claim fails to exceed a threshold calibrated via split conformal prediction. Existing methods in this area suffer from two deficiencies. First, the guarantee stated is not conditionally valid. The trustworthiness of the filtering step may vary based on the topic of the response. Second, because the scoring function is imperfect, the filtering step can remove many valuable and accurate claims. We address both of these challenges via two new conformal methods. First, we generalize the conditional conformal procedure of Gibbs et al. (2023) in order to adaptively issue weaker guarantees when they are required to preserve the utility of the output. Second, we show how to systematically improve the quality of the scoring function via a novel algorithm for differentiating through the conditional conformal procedure. We demonstrate the efficacy of our approach on both synthetic and real-world datasets.

Keywords

Large language models

conformal inference

Speaker

Emmanuel Candes, Stanford University

Robust Detection and Proportion Estimation of Statistical Language Watermarks

Since the introduction of ChatGPT in November 2022, embedding (nearly) unnoticeable statistical signals into text generated by large language models (LLMs)—a technique known as watermarking—has emerged as a principled approach for provably distinguishing LLM-generated text from human-written content. This talk addresses two critical challenges in this domain: (1) robust detection of watermarks and (2) provable estimation of the proportion of statistical language watermarks when users edit the watermarked text generated by LLMs.

For robust detection, we introduce a statistical framework that models the problem as a mixture detection task. Our approach employs a Truncated family of Goodness-of-Fit (Tr-GoF) tests, which we demonstrate to be optimally robust in two key ways: (i) achieving the optimal detection boundary as the watermark signal asymptotically diminishes, and (ii) attaining the highest detection efficiency in the presence of constant modifications. In contrast, existing sum-based detection methods for Gumbel-max watermarks fail to meet these benchmarks without relying on additional problem-specific information. Through simulations, we validate these theoretical guarantees, and real-world experiments confirm the superior or comparable performance of our method in maintaining watermark detectability, particularly in low-temperature settings.

For estimating the proportion of watermarked content, we establish the identifiability of this proportion across three watermarking schemes and propose efficient estimators to quantify the proportion of watermarked subtexts within mixed-source texts. These estimators achieve minimax optimality and exhibit strong performance in both simulated and real datasets.

Keywords

Watermark

LLM

Robust detection

Mixture model

Proportion estimation

Speaker

Xiang Li

A Plug-and-Play Watermark Framework for AI-Generated Images

Safeguarding intellectual property and preventing potential misuse of AI-generated images are of paramount importance. This talk introduces a robust and agile plug-and-play watermark detection framework, dubbed as RAW. As a departure from traditional encoder-decoder methods, which incorporate fixed binary codes as watermarks within latent representations, our approach introduces learnable watermarks directly into the original image data. Subsequently, we employ a classifier that is jointly trained with the watermark to detect the presence of the watermark. The proposed framework is compatible with various generative architectures and supports on-the-fly watermark injection after training. By incorporating state-of-the-art smoothing techniques, we show that the framework provides provable guarantees regarding the false positive rate for misclassifying a watermarked image, even in the presence of certain adversarial attacks targeting watermark removal. Experiments on a diverse range of images generated by state-of-the-art diffusion models reveal substantial performance enhancements compared to existing approaches. For instance, our method demonstrates a notable increase in AUROC, from 0.48 to 0.82, when compared to state-of-the-art approaches in detecting watermarked images under adversarial attacks, while maintaining image quality, as indicated by closely aligned FID and CLIP scores.

Keywords

watermark

data security

Speaker

Xuan Bi

PresentationZ

Speaker

Mengdi Wang, Princeton University