Monday, Aug 3: 10:30 AM - 12:20 PM
1220
Invited Paper Session
Deep learning and generative AI have achieved unprecedented practical success, yet their scale and complexity outpace classical statistical theory and call for new mathematical understanding. Fundamental questions remain about why these models generalize, how to efficiently adapt them in high-dimensional regimes, and what statistical principles underlie their generative and biologically inspired learning mechanisms. This session highlights recent advances that bring rigorous mathematical and statistical tools to bear on these challenges, offering theory-driven perspectives that demystify modern architectures, reveal structure in unstructured data, and establish principled foundations for scalable and interpretable learning. By bridging classical insights with contemporary AI practice, the session underscores the central role of mathematics and statistics in explaining and guiding the future of deep learning and generative modeling.
Applied
Yes
Main Sponsor
IMS
Co Sponsors
International Chinese Statistical Association
Section on Statistical Learning and Data Science
Presentations
Large language models are capable of in-context learning, the ability to perform new tasks at test time using a handful of input-output examples, without parameter updates. We develop a universal approximation theory to elucidate how transformers enable in-context learning. For a general class of functions (each representing a distinct task), we demonstrate how to construct a transformer that, without any further weight updates, can predict based on a few noisy in-context examples with vanishingly small risk. Unlike prior work that frames transformers as approximators of optimization algorithms (e.g., gradient descent) for statistical learning tasks, we integrate Barron's universal function approximation theory with the algorithm approximator viewpoint. Our approach yields approximation guarantees that are not constrained by the effectiveness of the optimization algorithms being mimicked, extending far beyond convex problems like linear regression. The key is to show that (i) any target function can be nearly linearly represented, with small L1-norm, over a set of universal features, and (ii) a transformer can be constructed to find the linear representation -- akin to solving Lasso -- at test time. This is joint work with Gen Li, Yuchen Jiao, Yu Huang, and Yuting Wei.
Convolutional Neural Networks (CNNs) are foundational in modern image analysis due to their ability to efficiently learn feature representations. However, theoretical understanding of their efficiency remains limited, largely due to inadequate modeling of image structures and their interaction with CNNs. To address this, we introduce novel statistical generative models (SGMs) that decompose images into task-relevant signals and noise, capturing the complexities of natural image data. Based on these SGMs, we propose a feature mapping approach (FMA) to characterize the transformation from raw image data to feature vectors. We analyze CNNs' approximation capabilities, their adaptation to low-dimensional structures, and their efficiency in vision tasks, ultimately developing statistical learning theories for CNN-based image analysis. Our findings reveal the challenges inherent in vision tasks and highlight CNNs' remarkable efficiency in addressing them, providing new insights into their theoretical and practical capabilities. This is based on the joint work with Dr. Guohao Shen.
Keywords
Convolutional neural networks
Image data
Statistical generative model
Point process
Approximation Theory
Artificial neural networks are inspired by the functioning of the brain but differ in several key aspects. In biological neural networks, information is encoded in the spiking times of neurons. Furthermore, it is implausible that biological learning is based on gradient descent. This has prompted researchers to propose various biologically inspired learning procedures. However, these methods lack a solid theoretical foundation. While statistical theory for artificial neural networks has been developed over the past years, the aim now is to extend this theory to biological neural networks, as the future of AI is likely to draw even more inspiration from biology. In this talk, we will explore the challenges and present some statistical risk bounds for different biologically inspired learning rules.
Over-parameterized deep neural networks necessitate new statistical complexity measures to accurately capture generalization behavior. We analyze training as a dynamic statistical process characterized by distinct phases of feature learning and structural evolution. Our approach leverages tools from Singular Learning Theory (SLT), particularly the Local Learning Coefficient (LLC), providing singularity-aware measures of effective statistical capacity. A systematic investigation correlates the evolution of various complexity metrics (LLC, norm-based) with feature learning and generalization performance. Crucially, we introduce stabilized complexity measures, which are invariant across function-equivalent parameter sets. This ensures a statistically reliable, geometry-aware estimate of the model's true generalization capacity.
Keywords
deep learning
generalization
complexity measures
feature learning
Speaker
Jakob Heiss, UC Berkeley
Co-Author
Bin Yu, University of California at Berkeley