Soufiane Hayou, Postdoc, Simons Institute, UC Berkeley
Monday, February 26, 2024, 09:00
- 10:00
Building 9, Level 4, Room 4225
Neural networks have achieved impressive performance in many applications such as image and speech recognition and generation. State-of-the-art performance is usually achieved via a series of engineered modifications to existing neural architectures and their training procedures. However, a common feature of these systems is their large-scale nature: modern neural networks usually contain Billions - if not 10's of Billions - of trainable parameters, and empirical evaluations (generally) support the claim that increasing the scale of neural networks (e.g. width and depth) boosts the model performance if done correctly. However, given a neural network model, it is not straightforward to address the crucial question `how do we scale the network?'. In this talk, I will show how we can leverage different mathematical results to efficiently scale neural networks, with empirically confirmed benefits.