Print Close

Limit Theorems for Stochastic Gradient Descent with Infinite Variance

Presented During: Heavy-tails and Robust Statistics in Learning Algorithms

Jose Blanchet Co-Author
Stanford University

Aleksandar Mijatović Co-Author
University of Warwick

Wenhao Yang Speaker

Thursday, Aug 7: 9:35 AM - 9:55 AM
Topic-Contributed Paper Session

Music City Center

Stochastic gradient descent is a classic algorithm that has gained great popularity especially in
the last decades as the most common approach for training models in machine learning. While the
algorithm has been well-studied when stochastic gradients are assumed to have a finite variance,
there is significantly less research addressing its theoretical properties in the case of infinite variance
gradients. In this paper, we establish the asymptotic behavior of stochastic gradient descent in the
context of infinite variance stochastic gradients, assuming that the stochastic gradient is regular
varying with index α ∈ (1, 2). The closest result in this context was established in 1969, in the
one-dimensional case and assuming that stochastic gradients belong to a more restrictive class of
distributions. We extend it to the multidimensional case, covering a broader class of infinite variance
distributions. As we show, the asymptotic distribution of the stochastic gradient descent algorithm
can be characterized as the stationary distribution of a suitably defined Ornstein-Uhlenbeck process
driven by an appropriate stable Lévy process. Additionally, we explore the applications of these
results in linear regression and logistic regression models.

Keywords

Levy process

Stochastic gradient descent

heavy tail