Theory of Deep Learning and Generative AI

Song Mei Chair
UC Berkeley
 
Song Mei Organizer
UC Berkeley
 
Monday, Aug 4: 10:30 AM - 12:20 PM
0152 
Invited Paper Session 
Music City Center 
Room: CC-105B 

Applied

No

Main Sponsor

IMS

Co Sponsors

International Chinese Statistical Association
Section on Statistical Learning and Data Science

Presentations

Synthetic-Powered Predictive Inference

Conformal prediction is a framework for predictive inference with a distribution-free, finite-sample guarantee. However, it tends to provide uninformative prediction sets when calibration data are scarce. This paper introduces Synthetic-powered predictive inference ($\scp$), a novel framework that incorporates synthetic data---e.g., from a generative model---to improve sample efficiency. At the core of our method is a score transporter: an empirical quantile mapping that aligns nonconformity scores from trusted, real data with those from synthetic data. By carefully integrating the score transporter into the calibration process, $\scp$ provably achieves finite-sample coverage guarantees without making any assumptions about the real and synthetic data distributions. When the score distributions are well aligned, $\scp$ yields substantially tighter and more informative prediction sets than standard conformal prediction. Experiments on image classification---augmenting data with synthetic diffusion-model generated images---and on tabular regression demonstrate notable improvements in predictive efficiency in data-scarce settings. 

Keywords

deep learning 

Speaker

Edgar Dobriban, University of Pennsylvania

Deterministic Equivalents and Scaling Laws for Random Feature Regression

In this talk, we revisit random feature ridge regression (RFRR), a model that has recently gained renewed interest for investigating puzzling phenomena in deep learning—such as double descent, benign overfitting, and scaling laws. Our main contribution is a general deterministic equivalent for the test error of RFRR. Specifically, under a certain concentration property, we show that the test error is well approximated by a closed-form expression that only depends on the feature map eigenvalues. Notably, our approximation guarantee is non-asymptotic, multiplicative, and independent of the feature map dimension—allowing for infinite-dimensional features.

This deterministic equivalent can be used to precisely capture the above phenomenology in RFRR. As an example, we derive sharp excess error rates under standard power-law assumptions of the spectrum and target decay. In particular, we tightly characterize the optimal parametrization achieving minimax rate.

This is based on joint work with Basil Saeed (Stanford), Leonardo Defilippis (ENS), and Bruno Loureiro (ENS). 

Keywords

Random Feature Regression

Random Matrix Theory

Scaling laws

Benign overfitting 

Co-Author

Theodor Misiakiewicz, Yale University

Speaker

Theodor Misiakiewicz, Yale University

The Emergence of Generalizability and Semantic Low-Dim Subspaces in Diffusion Models

Recent empirical studies have shown that diffusion models possess a unique reproducibility property, transiting from memorization to generalization as the number of training samples increases. This demonstrates that diffusion models can effectively learn image distributions and generate new samples. Remarkably, these models achieve this even with a small number of training samples, despite the challenge of large image dimensions, effectively circumventing the curse of dimensionality. In this work, we provide theoretical insights into this phenomenon by leveraging two key empirical observations: (i) the low intrinsic dimensionality of image datasets and (ii) the low-rank property of the denoising autoencoder in trained diffusion models. With these setups, we rigorously demonstrate that optimizing the training loss of diffusion models is equivalent to solving the canonical subspace clustering problem across the training samples. This insight has practical implications for training and controlling diffusion models. Specifically, it enables us to precisely characterize the minimal number of samples necessary for accurately learning the low-rank data support, shedding light on the phase transition from memorization to generalization. Additionally, we empirically establish a correspondence between the subspaces and the semantic representations of image data, which enables one-step, transferrable, efficient image editing. Moreover, our results have profound practical implications for training efficiency and model safety, and they also open up numerous intriguing theoretical questions for future research. 

Keywords

diffusion models

low-dimensionality

distribution learning 

Speaker

Qing Qu, University of Michigan