Principled Scaling of Deep Neural Networks

  • Soufiane Hayou, Postdoc, Simons Institute, UC Berkeley
-

B9 L4 R4225

Neural networks have achieved impressive performance in many applications such as image and speech recognition and generation. State-of-the-art performance is usually achieved via a series of engineered modifications to existing neural architectures and their training procedures. However, a common feature of these systems is their large-scale nature: modern neural networks usually contain Billions - if not 10's of Billions - of trainable parameters, and empirical evaluations (generally) support the claim that increasing the scale of neural networks (e.g. width and depth) boosts the model performance if done correctly. However, given a neural network model, it is not straightforward to address the crucial question `how do we scale the network?'. In this talk, I will show how we can leverage different mathematical results to efficiently scale neural networks, with empirically confirmed benefits.

Overview

Abstract

Neural networks have achieved impressive performance in many applications such as image and speech recognition and generation. State-of-the-art performance is usually achieved via a series of engineered modifications to existing neural architectures and their training procedures. However, a common feature of these systems is their large-scale nature: modern neural networks usually contain Billions - if not 10's of Billions - of trainable parameters, and empirical evaluations (generally) support the claim that increasing the scale of neural networks (e.g. width and depth) boosts the model performance if done correctly. However, given a neural network model, it is not straightforward to address the crucial question `how do we scale the network?'. In this talk, I will show how we can leverage different mathematical results to efficiently scale neural networks, with empirically confirmed benefits.

 

Brief Biography

 

Soufiane Hayou obtained his PhD in statistics in 2021 from Oxford where he was advised by Arnaud Doucet and Judith Rousseau. He graduated from Ecole Polytechnique in Paris before joining Oxford. During his PhD, he worked mainly on the theory of randomly initialized infinite-width neural networks on topics including the impact of the hyperparameters on how the 'geometric' information propagates inside the network. He is currently a Researcher at Simons Insitute for the Theory of Computing, on leave from his Peng Tsu Ann Assistant Professorship in mathematics at the National University of Singapore. His current research is focused on the theory and practice of scaling of neural networks.

Presenters

Soufiane Hayou, Postdoc, Simons Institute, UC Berkeley