Layered Models can "Automatically" Discover Low-Dimensional Structures via Feature Learning
Yang Li
Co-Author
Massachusetts Institute of Technology
Yunlu Chen
Presenting Author
Northwestern University
Thursday, Aug 7: 11:35 AM - 11:50 AM
1382
Contributed Papers
Music City Center
Layered models like neural networks appear to extract key features from data through empirical risk minimization, yet the theoretical understanding for this process remains unclear. Motivated by these observations, we study a two-layer nonparametric regression model where the input undergoes a linear transformation followed by a nonlinear mapping to predict the output, mir- roring the structure of two-layer neural networks. In our model, both layers are optimized jointly through empirical risk minimization, with the nonlinear layer modeled by a reproducing kernel Hilbert space induced by a rotation and translation invariant kernel, regularized by a ridge penalty.
Our main result shows that the two-layer model can "automatically'' induce regularization and facilitate feature learning. Specifically, the two-layer model promotes dimensionality reduction in the linear layer and identifies a parsimonious subspace of relevant features-even without applying any norm penalty on the linear layer. Notably, this regularization effect arises directly from the model's layered structure. Real-world data experiments further demonstrate the persistence of this phenomenon in practice.
layered models
regularization
feature learning
central mean subspace
reproducing kernel Hilbert space
ridge regression
Main Sponsor
Section on Statistical Learning and Data Science
You have unsaved changes.