Optimization for Deep Learning

KAUST-CEMSE-AMCS-MAC-Konstantin-Mischenko-Optimization-for-Deep-Learning.jpg

Mathematics and Applications Colloquium

Event Start

2024-05-07 - 16:00

Event End

2024-05-07 - 17:00

Location

Building 5, Level 5, Room 5209

Dr. Konstantin Mischenko, Samsung AI Center, Cambridge, UK

https://www.konstmish.com/

Abstract

The field of optimization for machine learning has undergone significant changes in recent years with deep learning models increasing in scale and fine-tuning taking a more prominent role. In this presentation, I will share a perspective on the direction of changes in the field and highlight interesting research directions. I will provide real-world examples of what practitioners want from optimization methods to train deep networks at scale. I will then present my recent work on adaptive methods, such as Adam and Adagrad, and explain how we can estimate the learning rate for these methods using theoretical tools from convex deterministic optimization and provide their convergence guarantees. Finally, I will present an extensive numerical evaluation of these methods on the task of training deep networks, including ViT on ImageNet, RoBERTa on BookWiki, GPT Transformer on BookWiki, and others.

Brief Biography

Konstantin Mishchenko is a Research Scientist at Samsung in Cambridge, UK. Before joining Samsung, he was a postdoc in the group of Francis Bach at Inria Paris, and he did his PhD at KAUST under the supervision of Peter Richtárik. Konstantin's work on adaptive methods was awarded the Outstanding Paper Award from the ICML conference in 2023.

Contact Person

Peter Richtarik

Related Persons

Peter Richtarik

Professor, Computer Science

Professors

Event Start

Event End

Location

Abstract

Brief Biography

Contact Person

Related Persons

Peter Richtarik

Events

Advanced Perspectives in Nonlinear PDEs: From Theory to Applications

Asymptotic Preserving (AP) and Asymptotic Accurate (AA) numerical scheme for a two-species multiscale model

Weak Convergence Methods for Nonlinear Partial Differential Equations

CEMSE - Computer, Electrical and Mathematical Sciences and Engineering Division

Biological and Environmental Sciences Engineering Division

Physical Science and Engineering Division

Study

Expanding Knowledge

Student Affairs

Living in KAUST

About KAUST

Latest from KAUST

Applied Mathematics and Computational Sciences Program