The First Optimal Parallel SGD (in the Presence of Data, Compute and Communication Heterogeneity)

Graduate Seminar

Event Start

2024-09-02 - 11:30

Event End

2024-09-02 - 12:30

Location

Building 9, Level 2, Room 2325

Abstract

"The design of efficient parallel/distributed optimization methods and tight analysis of their theoretical properties are important research endeavors. While minimax complexities are known for sequential optimization methods, the theory of parallel optimization methods is surprisingly much less explored, especially in the presence of data, compute and communication heterogeneity.
In the first part of the talk [1], we establish the first optimal time complexities for parallel optimization methods (Rennala SGD and Malenia SGD) that have access to an unbiased stochastic gradient oracle with bounded variance, under the assumption that the workers compute stochastic gradients with different speeds, i.e., we assume compute heterogeneity. We prove lower bounds and develop optimal algorithms that attain them, both in the data homogeneous and heterogeneous regimes.
In the second part of the talk [2], we establish the first optimal time complexities for parallel optimization methods (Shadowheart SGD) that have access to an unbiased stochastic gradient oracle with bounded variance, under the assumption that the workers compute stochastic gradients with different speeds, as before, but under the further assumption that the worker-to-server communication times are nonzero and heterogeneous. We prove lower bounds and develop optimal algorithms that attain them, in the data homogeneous regime only.
Our results have surprising consequences for the literature of asynchronous optimization methods: in contrast with prior attempts to tame compute heterogeneity via ""complete/wild"" compute and update asynchronicity, our methods alternate fast asynchornous computation of a minibatch of stochastic gradients with infrequent synchronous update steps.
[1] Alexander Tyurin and Peter Richtárik. Optimal time complexities of parallel stochastic optimization methods under a fixed computation model, Advances in Neural Information Processing Systems 36 (NeurIPS 2023).
[2] Alexander Tyurin, Marta Pozzi, Ivan Ilin, and Peter Richtárik. Shadowheart SGD: Distributed asynchronous SGD with optimal time complexity under arbitrary computation and communication heterogeneity, arXiv:2402.04785, 2024."Abstract text.

Brief Biography

Peter Richtárik is a professor of Computer Science at the King Abdullah University of Science and Technology (KAUST), Saudi Arabia, where he leads the Optimization and Machine Learning Lab. His research interests lie at the intersection of mathematics, computer science, machine learning, optimization, numerical linear algebra, and high-performance computing. Through his work on randomized and distributed optimization algorithms, he has contributed to the foundations of machine learning, optimization and randomized numerical linear algebra. He is one of the original developers of Federated Learning. Prof Richtárik’s works attracted international awards, including a Best Paper Award at the NeurIPS 2020 Workshop on Scalability, Privacy, and Security in Federated Learning, Distinguished Speaker Award at the 2019 International Conference on Continuous Optimization, SIAM SIGEST Best Paper Award, and the IMA Leslie Fox Prize (three times). Several of his works are among the most read papers published by the SIAM Journal on Optimization and the SIAM Journal on Matrix Analysis and Applications. Prof Richtárik serves as an Area Chair for leading machine learning conferences, including NeurIPS, ICML and ICLR, and is an Action Editor of JMLR, and Associate Editor of Numerische Mathematik and Optimization Methods and Software. In the past, he served as an Action Editor of TMLR and an Area Editor of JOTA.

Contact Person

Di Wang

Event Start

Event End

Location

Abstract

Brief Biography

Contact Person

Events

Physics-based Simulation in Visual Computing

Pretraining Large Language Models on Japanese Supercomputers

The First Optimal Parallel SGD (in the Presence of Data, Compute and Communication Heterogeneity)

CEMSE - Computer, Electrical and Mathematical Sciences and Engineering Division

Biological and Environmental Sciences Engineering Division

Physical Science and Engineering Division

Study

Expanding Knowledge

Student Affairs

Living in KAUST

About KAUST

Latest from KAUST

Computer Science Program