Training Neural Networks at Any Scale

Volkan Cevher, Associate Professor, School of Engineering, Swiss Federal Institute of Technology (EPFL), Switzerland

Hosts:

Peter Richtarik, Professor, Computer Science

Nov 24, 12:00 - 13:00

B9 L2 R2325

machine learning Numerical simulation and analysis Reinforcement Learning deep learning optimization

The talk explores a key mathematical ingredient of scaling in tandem with scaling theory: the numerical solution algorithms commonly employed in deep learning, spanning domains from vision to language models.

At the heart of deep learning’s transformative impact lies the concept of scale--encompassing both data and computational resources, as well as their interaction with neural network architectures. Scale, however, presents critical challenges, such as increased instability during training and prohibitively expensive model-specific tuning. Given the substantial resources required to train such models, formulating high-confidence scaling hypotheses backed by rigorous theoretical research has become paramount.

To bridge theory and practice, the talk explores a key mathematical ingredient of scaling in tandem with scaling theory: the numerical solution algorithms commonly employed in deep learning, spanning domains from vision to language models. We unify these algorithms under a common master template, making their foundational principles transparent. In doing so, we reveal the interplay between adaptation to smoothness structures via online learning and the exploitation of optimization geometry through non-Euclidean norms. Our exposition moves beyond simply building larger models--it emphasizes strategic scaling, offering insights that promise to advance the field while economizing on resources.

Presenters

Volkan Cevher, Associate Professor, School of Engineering, Swiss Federal Institute of Technology (EPFL), Switzerland

Brief Biography

Volkan Cevher received the B.Sc. (valedictorian) in electrical engineering from Bilkent University in Ankara, Turkey, in 1999 and the Ph.D. in electrical and computer engineering from the Georgia Institute of Technology in Atlanta, GA in 2005. He was a Research Scientist with the University of Maryland, College Park, from 2006-2007 and also with Rice University in Houston, TX, from 2008-2009. He was also a Faculty Fellow in the Electrical and Computer Engineering Department at Rice University from 2010-2020. Currently, he is an Associate Professor at the Swiss Federal Institute of Technology Lausanne and an Amazon Scholar. His research interests include machine learning, optimization theory and methods, and automated control. Dr. Cevher is an IEEE Fellow ('24), an ELLIS fellow, and was the recipient of the ICML AdvML Best Paper Award in 2023, Google Faculty Research award in 2018, the IEEE Signal Processing Society Best Paper Award in 2016, a Best Paper Award at CAMSAP in 2015, a Best Paper Award at SPARS in 2009, and an ERC CG in 2016 as well as an ERC StG in 2011.

Training Neural Networks at Any Scale

Presenters

Volkan Cevher, Associate Professor, School of Engineering, Swiss Federal Institute of Technology (EPFL), Switzerland

Brief Biography

Share

Related Sites

Related Links

Computer, Electrical and Mathematical Sciences and Engineering (CEMSE)

Connect with us

Training Neural Networks at Any Scale

Overview

Presenters

Volkan Cevher, Associate Professor, School of Engineering, Swiss Federal Institute of Technology (EPFL), Switzerland

Brief Biography

Share

Related Sites

Related Links