Approximation and Optimization for Neural Networks
In this talk, we consider new connections between the approximation and optimization of neural networks. Instead of relying on excessive over-parametrization to achieve zero training loss, we identify good minima by comparison with established approximation bounds.
Overview
Theoretically, neural networks have remarkable approximation properties, such as super-convergence for Sobolev smooth functions and dimension independent rates for Barron and related smoothness. Practically, it remains an open question to what extend the theoretical performance can be matched by available training algorithms. Optimization result prove that gradient descent achieves global minima. In the majority of the literature, these global minima are identified by zero training loss, possible due to severe over-parametrization.
In this talk, we consider new connections between the approximation and optimization of neural networks. Instead of relying on excessive over-parametrization to achieve zero training loss, we identify good minima by comparison with established approximation bounds. In a landscape analysis under mean-field conditions, we prove that local optima show equidistribution behavior similar to adaptive mesh refinement in finite element methods. The equidistribution constants match Barron norms for shallow networks and allow us to propose generalizations of Barron smoothness for deep networks.
Presenters
Gritte Welper, Assistant Professor, Mathematics, University of Central Florida (UCF)
Brief Biography
Gerrit Welper is an assistant professor at the University of Central Florida. He earned his Ph.D. at RWTH Aachen and held postdoc positions in Texas A&M University and the University of Southern California.
He has worked on stable and adaptive methods for convection dominated PDEs. Later he constructed efficient reduced order models for parametric hyperbolic problems with shocks. More recently, his work is focused on the mathematical foundations of neural networks, in particular the interplay between approximation and optimization theory.