Averaging, Momentum, and Schedulers in Optimization for Deep Learning

Konstantin Mishchenko

Coordinators:

Peter Richtarik, Professor, Computer Science

May 5, 11:00 - 13:00

B9 L3 R3128

In this talk, I will present some work in progress on practical optimization methods for deep learning. We will start with a discussion of several empirical techniques that enable training of large-scale models in language and vision tasks, including weight decay, averaging, and schedulers. We will then look at a new approach that we call schedule-free due to its ability to work without a pre-defined time horizon. I will share some details about the theory for these methods, explain why they might be useful in practice and then shed some light on their limitations. This talk will be oriented towards people who already have some knowledge of optimization methods.

Abstract

Brief Biography

Konstantin Mishchenko is a Research Scientist at Samsung in Cambridge, UK. Before joining Samsung, he was a postdoc in the group of Francis Bach at Inria Paris, and he did his PhD at KAUST under the supervision of Peter Richtárik. Konstantin’s work on adaptive methods was awarded the Outstanding Paper Award from the ICML conference in 2023.

Averaging, Momentum, and Schedulers in Optimization for Deep Learning

Abstract

Brief Biography

Presenters

Konstantin Mishchenko

Share

Related Sites

Computer, Electrical and Mathematical Sciences and Engineering (CEMSE)

Connect with us

Averaging, Momentum, and Schedulers in Optimization for Deep Learning

Overview

Abstract

Brief Biography

Presenters

Konstantin Mishchenko

Share

Related Sites