Theoretical Foundations of Communication-Efficient, Robust, and Practical Distributed and Federated Optimization

This thesis investigates seven critical challenges at the intersection of theory and practice, specifically focusing on the fundamental bottlenecks of Federated Learning and distributed optimization and develops novel algorithmic frameworks that provide sharp theoretical guarantees to bridge the gap between heuristic success and mathematical rigor.

Overview

The rapid advancement of Machine Learning and the evolution of optimization have consistently evolved together, as practical needs and theoretical breakthroughs push both areas to progress. While modern large-scale training relies on classical optimization, the unique constraints of distributed systems have necessitated a complete rethink of foundational theory. In this thesis, we address seven critical challenges at the intersection of theory and practice, specifically focusing on the fundamental bottlenecks of Federated Learning and distributed optimization. For each of these challenges, we design novel algorithmic frameworks and provide sharp theoretical guarantees that bridge the gap between heuristic success and mathematical rigor.

Our first contribution resolves a fundamental question in communication efficiency within federated environments by introducing ProxSkip and proving that local gradient steps can lead to provable acceleration, finally establishing a formal foundation for this widely used heuristic. We extend this framework with Variance Reduced ProxSkip, which integrates variance reduction to eliminate the neighborhood error typically observed in stochastic local updates while effectively balancing communication and local computation costs. Our third contribution addresses the scalability and efficiency of these methods under partial participation, demonstrating that local steps still achieve provable communication acceleration even when only a subset of clients is active in each round. In the fourth challenge, we investigate the interplay of server-side dynamics and data selection, proving that server-side stepsizes and sampling without replacement significantly improve convergence in heterogeneous settings.

Building on these insights regarding data selection, our fifth contribution shows that in the context of Random Reshuffling, compressing gradient differences rather than raw gradients leads to superior theoretical and practical performance. We then address system security by proving that Byzantine robustness and partial participation can be achieved simultaneously through a gradient difference clipping mechanism. Finally, we move beyond classical optimization to establish the first theoretical framework for low-rank adaptation via randomized asymmetric chains, offering new insights into the fine-tuning of massive-scale models. Across all contributions, our results provide tighter analyses and more realistic assumptions than prior work, supported by numerical experiments that confirm the sharpness of our theoretical findings.

Presenters

Brief Biography

Grigory Malinovsky is a Ph.D. candidate in Applied Mathematics and Computational Science at King Abdullah University of Science and Technology (KAUST), studying under the supervision of Professor Peter Richtárik in the Optimization and Machine Learning Lab.

Grigory holds a Master of Science in Applied Mathematics and Computational Science from KAUST and a Bachelor's degree in Applied Mathematics and Physics from the Moscow Institute of Physics and Technology, where he completed his thesis under the guidance of Boris Polyak.

During his Ph.D., he completed a six-month internship at Samsung R&D Institute UK, and collaborated during research visits with Sebastian Stich at CISPA, as well as with Samuel Horváth and Eduard Gorbunov at MBZUAI. His work has been published in leading machine learning conferences such as NeurIPS, ICLR, ICML, AISTATS, AAAI, and UAI. Alongside his research, he has contributed to teaching graduate-level optimization courses and actively mentored research interns, leading to multiple joint publications.