The Annals of Statistics Invited Papers

Lan Wang Chair
University of Miami, Herbert Business School
 
Lan Wang Organizer
University of Miami, Herbert Business School
 
Enno Mammen Organizer
 
Monday, Aug 4: 2:00 PM - 3:50 PM
0401 
Invited Paper Session 
Music City Center 
Room: CC-201A 

Applied

No

Main Sponsor

IMS

Co Sponsors

Section on Statistical Learning and Data Science

Presentations

A Statistical Framework of Watermarks for Large Language Models

Since ChatGPT was introduced in November 2022, embedding (nearly) unnoticeable statistical signals into text generated by large language models (LLMs), also known as watermarking, has been used as a principled approach to provable detection of LLM-generated text from its human-written counterpart. In this paper, we introduce a general and flexible framework for reasoning about the statistical efficiency of watermarks and designing powerful detection rules. Inspired by the hypothesis testing formulation of water- mark detection, our framework starts by selecting a pivotal statistic of the text and a secret key—provided by the LLM to the verifier—to control the false positive rate (the error of mistakenly detecting human-written text as LLM-generated). Next, this framework allows one to evaluate the power of watermark detection rules by obtaining a closed-form expression of the asymptotic false negative rate (the error of incorrectly classifying LLM-generated text as human-written). Our framework further reduces the problem of determining the optimal detection rule to solving a minimax optimization program. We apply this framework to two representative watermarks—one of which has been internally implemented at OpenAI—and obtain several findings that can be instrumental in guiding the practice of implementing watermarks. In particular, we derive optimal detection rules for these watermarks under our framework. These theoretically derived detection rules are demonstrated to be competitive and sometimes enjoy a higher power than existing detection approaches through numerical experiments. 

Keywords

Large language model

Watermark

ChatGPT

Detection 

Speaker

Weijie Su, University of Pennsylvania

ARK: Robust Knockoffs Inference with Coupling

Speaker

Jinchi Lv, University of Southern California

Convergence Rates of Oblique Regression Trees for Flexible Function Libraries

Decision trees and neural networks are conventionally seen as two contrasting approaches to learning. The popular belief is that decision trees compromise accuracy for being easy to use and understand, whereas neural networks are more accurate, but at the cost of being less transparent. In this talk, we challenge the status quo by showing that, under suitable conditions, decision trees that recursively place splits along linear combinations of the covariates achieve similar modeling power and predictive accuracy as single-hidden layer neural networks. The analytical framework presented here can importantly accommodate many existing computational tools in the literature, such as those based on randomization, dimensionality reduction, and mixed-integer optimization. 

Keywords

decision trees, neural networks, greedy algorithms 

Speaker

Jason Klusowski, Princeton University

The approximation accuracy of Gaussian variational inference and related posterior approximations in high-dimensional Bayesian inference

The main computational challenge in Bayesian inference is to compute integrals against a high-dimensional posterior distribution. In the past decades, variational inference (VI) has emerged as a tractable approximation to these integrals, and a viable alternative to the more established paradigm of Markov Chain Monte Carlo. However, little is known about the approximation accuracy of VI. We present new bounds on the TV error and the mean and covariance approximation error of Gaussian VI in terms of dimension and sample size. Our proof technique is part of a general framework that allows to precisely analyze the accuracy of asymptotic approximations to integrals against high-dimensional posteriors, in the regime of posterior concentration. We also present new sharp bounds on the accuracy of the Laplace approximation (a related Gaussian posterior approximation method), based on this same general framework. Finally, we compare and contrast these two Gaussian approximations, VI and Laplace.  

Co-Author(s)

Philippe Rigollet, MIT
Anya Katsevich, Massachusetts Institute of Technology

Speaker

Anya Katsevich, Massachusetts Institute of Technology