Print Close

Bridging Statistics and Modern AI: Foundations for Deep Learning and Generative Models

Yuting Wei Chair
University of Pennsylvania

Yuting Wei Organizer
University of Pennsylvania

Monday, Aug 3: 10:30 AM - 12:20 PM
1220
Invited Paper Session

Deep learning and generative AI have achieved unprecedented practical success, yet their scale and complexity outpace classical statistical theory and call for new mathematical understanding. Fundamental questions remain about why these models generalize, how to efficiently adapt them in high-dimensional regimes, and what statistical principles underlie their generative and biologically inspired learning mechanisms. This session highlights recent advances that bring rigorous mathematical and statistical tools to bear on these challenges, offering theory-driven perspectives that demystify modern architectures, reveal structure in unstructured data, and establish principled foundations for scalable and interpretable learning. By bridging classical insights with contemporary AI practice, the session underscores the central role of mathematics and statistics in explaining and guiding the future of deep learning and generative modeling.

Applied

Yes

Main Sponsor

IMS

Co Sponsors

International Chinese Statistical Association

Section on Statistical Learning and Data Science

Presentations

Reinforcement Learning for Fine-Tuning of LLMs

Speaker

Alexander Rakhlin, MIT

Transformers meet in-context learning: A universal approximation theory

Large language models are capable of in-context learning, the ability to perform new tasks at test time using a handful of input-output examples, without parameter updates. We develop a universal approximation theory to elucidate how transformers enable in-context learning. For a general class of functions (each representing a distinct task), we demonstrate how to construct a transformer that, without any further weight updates, can predict based on a few noisy in-context examples with vanishingly small risk. Unlike prior work that frames transformers as approximators of optimization algorithms (e.g., gradient descent) for statistical learning tasks, we integrate Barron's universal function approximation theory with the algorithm approximator viewpoint. Our approach yields approximation guarantees that are not constrained by the effectiveness of the optimization algorithms being mimicked, extending far beyond convex problems like linear regression. The key is to show that (i) any target function can be nearly linearly represented, with small L1-norm, over a set of universal features, and (ii) a transformer can be constructed to find the linear representation -- akin to solving Lasso -- at test time. This is joint work with Gen Li, Yuchen Jiao, Yu Huang, and Yuting Wei.

Speaker

Yuxin Chen, University of Pennsylvania

Understanding Convolutional Neural Networks: Statistical Generative Models for Unstructured Image Data.

Convolutional Neural Networks (CNNs) are foundational in modern image analysis due to their ability to efficiently learn feature representations. However, theoretical understanding of their efficiency remains limited, largely due to inadequate modeling of image structures and their interaction with CNNs. To address this, we introduce novel statistical generative models (SGMs) that decompose images into task-relevant signals and noise, capturing the complexities of natural image data. Based on these SGMs, we propose a feature mapping approach (FMA) to characterize the transformation from raw image data to feature vectors. We analyze CNNs' approximation capabilities, their adaptation to low-dimensional structures, and their efficiency in vision tasks, ultimately developing statistical learning theories for CNN-based image analysis. Our findings reveal the challenges inherent in vision tasks and highlight CNNs' remarkable efficiency in addressing them, providing new insights into their theoretical and practical capabilities. This is based on the joint work with Dr. Guohao Shen.

Keywords

Convolutional neural networks

Image data

Statistical generative model

Point process

Approximation Theory

Speaker

Hongtu Zhu

Statistical theory for biologically inspired learning rules.

Artificial neural networks are inspired by the functioning of the brain but differ in several key aspects. In biological neural networks, information is encoded in the spiking times of neurons. Furthermore, it is implausible that biological learning is based on gradient descent. This has prompted researchers to propose various biologically inspired learning procedures. However, these methods lack a solid theoretical foundation. While statistical theory for artificial neural networks has been developed over the past years, the aim now is to extend this theory to biological neural networks, as the future of AI is likely to draw even more inspiration from biology. In this talk, we will explore the challenges and present some statistical risk bounds for different biologically inspired learning rules.

Speaker

Johannes Schmidt-Hieber

Complexity measures and generalization in deep learning

Over-parameterized deep neural networks necessitate new statistical complexity measures to accurately capture generalization behavior. We analyze training as a dynamic statistical process characterized by distinct phases of feature learning and structural evolution. Our approach leverages tools from Singular Learning Theory (SLT), particularly the Local Learning Coefficient (LLC), providing singularity-aware measures of effective statistical capacity. A systematic investigation correlates the evolution of various complexity metrics (LLC, norm-based) with feature learning and generalization performance. Crucially, we introduce stabilized complexity measures, which are invariant across function-equivalent parameter sets. This ensures a statistically reliable, geometry-aware estimate of the model's true generalization capacity.

Keywords

deep learning

generalization

complexity measures

feature learning

Speaker

Jakob Heiss, UC Berkeley

Co-Author

Bin Yu, University of California at Berkeley