Error Feedback for Communication-Efficient First and Second-Order Distributed Optimization: Theory and Practical Implementation

This seminar will discuss advancements in Federated Learning, including theoretical improvements to the Error Feedback method (EF21) for communication-efficient distributed training and the development of significantly more practical and efficient implementations of the Federated Newton Learn (FedNL) algorithm.

Overview

Federated Learning (FL) is an emerging paradigm that enables intelligent agents to collaboratively train machine learning models in a distributed fashion, without sharing their local data. If your interests lie in distributed machine learning training algorithms, theoretical advancements, or the co-design of algorithms and implementations, we invite you to attend the upcoming CS seminar. This session will have something for you, whether you are more theoretically inclined, practically focused, or somewhere in between.

In the first part of the talk, I will talk about Error Feedback (EF) -- a widely used and highly effective strategy for addressing convergence issues in distributed training algorithms like gradient descent (GD) and stochastic gradient descent (SGD), especially when paired with aggressive communication compression techniques such as TopK. Initially introduced by Seide et al. (2014) , EF has since been the subject of intensive theoretical study, though many questions remain open. We focus on EF21 , a modern error feedback method that offers state-of-the-art theoretical guarantees under minimal assumptions and excels in practice. A key contribution of our work is improving EF21’s communication complexity from dependence on the quadratic mean of smoothness constants to the arithmetic mean, which is always smaller, often substantially. This constitutes a meaningful theoretical refinement in distributed non-convex optimization using contractive compressors.

In the second part of the talk, I will talk about a communication-efficient distributed variant of the famous Newton's method of optimization: Federated Newton Learn (FedNL), introduced by Safaryan et al. . FedNL can be seen as the coupling of Newton's method with EF21. While conceptually powerful, the original implementation of FedNL has three practical drawbacks: (1) it requires 4.8 hours to launch a single experiment on a server-grade workstation; (2) it only simulates multi-node setups; (3) it is impractical for resource-constrained environments. We developed FedNL-LS and FedNL-PP—self-contained, compute-optimized implementations suitable for single-node and multi-node scenarios to address these limitations. These achieve up to 1000× reductions in wall-clock time, making FedNL practical and efficient. Our implementation outperforms leading alternatives for shallow ML models: (i) CVXPY in single-node settings; (ii) Apache Spark and Ray/Scikit-learn in multi-node scenarios.

Presenters

Brief Biography

Konstantin Burlachenko is a Ph.D. candidate at the KAUST Optimization and Machine Learning Lab under the supervision of  Professor Peter Richtarik. Before joining KAUST, Konstantin worked in several prominent Moscow companies, such as Huawei, NVIDIA, and Yandex. He holds a master’s degree in computer science from Bauman Moscow State Technical University, Russia. 

After his graduation, he worked as a Senior Engineer for Acronis ,Yandex ,NVIDIA, and as a Principal Engineer for HUAWEI. Konstantin attended in Non-Degree Opportunity program at Stanford between 2015 and 2019 and obtained:

One of his sports achievements is the title of candidate Master of Sport in Chess.