Konstantin Mishchenko
Sunday, May 05, 2024, 11:00
- 13:00
Building 9, Level 3, Room 3128, https://kaust.zoom.us/j/95768114437
Contact Person
In this talk, I will present some work in progress on practical optimization methods for deep learning. We will start with a discussion of several empirical techniques that enable training of large-scale models in language and vision tasks, including weight decay, averaging, and schedulers. We will then look at a new approach that we call schedule-free due to its ability to work without a pre-defined time horizon. I will share some details about the theory for these methods, explain why they might be useful in practice and then shed some light on their limitations. This talk will be oriented towards people who already have some knowledge of optimization methods.