Limit Theorems for Stochastic Gradient Descent with Infinite Variance
Thursday, Aug 7: 9:35 AM - 9:55 AM
Topic-Contributed Paper Session
Music City Center
Stochastic gradient descent is a classic algorithm that has gained great popularity especially in
the last decades as the most common approach for training models in machine learning. While the
algorithm has been well-studied when stochastic gradients are assumed to have a finite variance,
there is significantly less research addressing its theoretical properties in the case of infinite variance
gradients. In this paper, we establish the asymptotic behavior of stochastic gradient descent in the
context of infinite variance stochastic gradients, assuming that the stochastic gradient is regular
varying with index α ∈ (1, 2). The closest result in this context was established in 1969, in the
one-dimensional case and assuming that stochastic gradients belong to a more restrictive class of
distributions. We extend it to the multidimensional case, covering a broader class of infinite variance
distributions. As we show, the asymptotic distribution of the stochastic gradient descent algorithm
can be characterized as the stationary distribution of a suitably defined Ornstein-Uhlenbeck process
driven by an appropriate stable Lévy process. Additionally, we explore the applications of these
results in linear regression and logistic regression models.
Levy process
Stochastic gradient descent
heavy tail
You have unsaved changes.