About Peter Richtarik Peter Richtarik Professor, Computer Science optimization machine learning big data algorithms Professor Richtárik specializes in big data optimization and machine learning. He is known for his work on randomized coordinate descent algorithms, stochastic gradient descent and federated learning. Events Presented Events Sep 14 - Sep 20, 2025 From the Ball-Proximal (Broximal) Point Method to Efficient Training of LLMs Peter Richtarik, Professor, Computer Science Sep 16, 16:00 - 17:00 B1 L3 R3119 AI machine learning optimization algorithms LLM This talk introduces the Ball-Proximal Point Method, a new foundational algorithm for non-smooth optimization with surprisingly fast convergence, and Gluon, a new theoretical framework that closes the gap between theory and practice for modern LMO-based deep learning optimizers. From the Ball-Proximal (Broximal) Point Method to Efficient Training of LLMs Peter Richtarik, Professor, Computer Science Sep 15, 12:00 - 13:00 B9 L2 R2325 AI machine learning optimization algorithms LLM This talk introduces the Ball-Proximal Point Method, a new foundational algorithm for non-smooth optimization with surprisingly fast convergence, and Gluon, a new theoretical framework that closes the gap between theory and practice for modern LMO-based deep learning optimizers. Aug 31 - Sep 6, 2025 From the Ball-Proximal (Broximal) Point Method to Efficient Training of LLMs Peter Richtarik, Professor, Computer Science Sep 4, 12:00 - 13:00 B9 L2 R2325 AI machine learning optimization algorithms LLM This talk introduces the Ball-Proximal Point Method, a new foundational algorithm for non-smooth optimization with surprisingly fast convergence, and Gluon, a new theoretical framework that closes the gap between theory and practice for modern LMO-based deep learning optimizers. Sep 11 - Sep 17, 2022 On the resolution of a theoretical question related to the nature of local training in federated learning Peter Richtarik, Professor, Computer Science Sep 13, 15:30 - 17:00 B1 L3 R3119 machine learning mathematical optimization communications algorithms In this talk, I will explain the problem, its solution, and some subsequent work generalizing, extending and improving the ProxSkip method in various ways. We study distributed optimization methods based on the local training (LT) paradigm - achieving improved communication efficiency by performing richer local gradient-based training on the clients before parameter averaging - which is of key importance in federated learning. Looking back at the progress of the field in the last decade, we identify 5 generations of LT methods: 1) heuristic, 2) homogeneous, 3) sublinear, 4) linear, and 5) accelerated. The 5th generation, initiated by the ProxSkip method of Mishchenko et al (2022) and its analysis, is characterized by the first theoretical confirmation that LT is a communication acceleration mechanism. Mar 27 - Apr 2, 2022 ProxSkip: Yes! Local Gradient Steps Provably Lead to Communication Acceleration! Finally! Peter Richtarik, Professor, Computer Science Mar 31, 12:00 - 13:00 B9 L2 H2 We introduce ProxSkip a surprisingly simple and provably efficient method for minimizing the sum of a smooth (ƒ) and an expensive nonsmooth proximable (ψ) function. Mar 20 - Mar 26, 2022 Permutation compressors for provably faster distributed nonconvex optimization Peter Richtarik, Professor, Computer Science Mar 21, 12:00 - 13:00 B9 R2322 H1 We study the MARINA method of Gorbunov et al (ICML 2021) - the current state-of-the-art distributed non-convex optimization method in terms of theoretical communication complexity. Theoretical superiority of this method can be largely attributed to two sources: the use of a carefully engineered biased stochastic gradient estimator, which leads to a reduction in the number of communication rounds, and the reliance on {\em independent} stochastic communication compression operators, which leads to a reduction in the number of transmitted bits within each communication round. Oct 31 - Nov 6, 2021 EF21: A new, simpler, theoretically better, and practically faster error feedback Peter Richtarik, Professor, Computer Science Nov 1, 12:00 - 13:00 B9 R2322 H1 Error feedback (EF), also known as error compensation, is an immensely popular convergence stabilization mechanism in the context of distributed training of supervised machine learning models enhanced by the use of contractive communication compression mechanisms, such as Top-k. First proposed by Seide et al (2014) as a heuristic, EF resisted any theoretical understanding until recently [Stich et al., 2018, Alistarh et al., 2018]. Apr 18 - Apr 24, 2021 Distributed second order methods with fast rates and compressed communication Peter Richtarik, Professor, Computer Science Apr 22, 12:00 - 13:00 KAUST We develop several new communication-efficient second-order methods for distributed optimization. Our first method, NEWTON-STAR, is a variant of Newton's method from which it inherits its fast local quadratic rate. However, unlike Newton's method, NEWTON-STAR enjoys the same per iteration communication cost as gradient descent. While this method is impractical as it relies on the use of certain unknown parameters characterizing the Hessian of the objective function at the optimum, it serves as the starting point which enables us to design practical variants thereof with strong theoretical guarantees. In particular, we design a stochastic sparsification strategy for learning the unknown parameters in an iterative fashion in a communication efficient manner. Applying this strategy to NEWTON-STAR leads to our next method, NEWTON-LEARN, for which we prove local linear and superlinear rates independent of the condition number. When applicable, this method can have dramatically superior convergence behavior when compared to state-of-the-art methods. Finally, we develop a globalization strategy using cubic regularization which leads to our next method, CUBIC-NEWTON-LEARN, for which we prove global sublinear and linear convergence rates, and a fast superlinear rate. Our results are supported with experimental results on real datasets, and show several orders of magnitude improvement on baseline and state-of-the-art methods in terms of communication complexity. Feb 4 - Feb 10, 2018 KAUST Research Workshop on Optimization and Big Data Peter Richtarik, Professor, Computer Science Feb 5, 08:00 - Feb 7, 05:00 B19 L3 H2 optimization machine learning Social Network Analysis asynchronous algorithms The age of "big data" is here: data of unprecedented sizes is becoming ubiquitous, which brings new challenges and new opportunities. With this comes the need to solve optimization problems of unprecedented sizes.
From the Ball-Proximal (Broximal) Point Method to Efficient Training of LLMs Peter Richtarik, Professor, Computer Science Sep 16, 16:00 - 17:00 B1 L3 R3119 AI machine learning optimization algorithms LLM This talk introduces the Ball-Proximal Point Method, a new foundational algorithm for non-smooth optimization with surprisingly fast convergence, and Gluon, a new theoretical framework that closes the gap between theory and practice for modern LMO-based deep learning optimizers.
From the Ball-Proximal (Broximal) Point Method to Efficient Training of LLMs Peter Richtarik, Professor, Computer Science Sep 15, 12:00 - 13:00 B9 L2 R2325 AI machine learning optimization algorithms LLM This talk introduces the Ball-Proximal Point Method, a new foundational algorithm for non-smooth optimization with surprisingly fast convergence, and Gluon, a new theoretical framework that closes the gap between theory and practice for modern LMO-based deep learning optimizers.
From the Ball-Proximal (Broximal) Point Method to Efficient Training of LLMs Peter Richtarik, Professor, Computer Science Sep 4, 12:00 - 13:00 B9 L2 R2325 AI machine learning optimization algorithms LLM This talk introduces the Ball-Proximal Point Method, a new foundational algorithm for non-smooth optimization with surprisingly fast convergence, and Gluon, a new theoretical framework that closes the gap between theory and practice for modern LMO-based deep learning optimizers.
On the resolution of a theoretical question related to the nature of local training in federated learning Peter Richtarik, Professor, Computer Science Sep 13, 15:30 - 17:00 B1 L3 R3119 machine learning mathematical optimization communications algorithms In this talk, I will explain the problem, its solution, and some subsequent work generalizing, extending and improving the ProxSkip method in various ways. We study distributed optimization methods based on the local training (LT) paradigm - achieving improved communication efficiency by performing richer local gradient-based training on the clients before parameter averaging - which is of key importance in federated learning. Looking back at the progress of the field in the last decade, we identify 5 generations of LT methods: 1) heuristic, 2) homogeneous, 3) sublinear, 4) linear, and 5) accelerated. The 5th generation, initiated by the ProxSkip method of Mishchenko et al (2022) and its analysis, is characterized by the first theoretical confirmation that LT is a communication acceleration mechanism.
ProxSkip: Yes! Local Gradient Steps Provably Lead to Communication Acceleration! Finally! Peter Richtarik, Professor, Computer Science Mar 31, 12:00 - 13:00 B9 L2 H2 We introduce ProxSkip a surprisingly simple and provably efficient method for minimizing the sum of a smooth (ƒ) and an expensive nonsmooth proximable (ψ) function.
Permutation compressors for provably faster distributed nonconvex optimization Peter Richtarik, Professor, Computer Science Mar 21, 12:00 - 13:00 B9 R2322 H1 We study the MARINA method of Gorbunov et al (ICML 2021) - the current state-of-the-art distributed non-convex optimization method in terms of theoretical communication complexity. Theoretical superiority of this method can be largely attributed to two sources: the use of a carefully engineered biased stochastic gradient estimator, which leads to a reduction in the number of communication rounds, and the reliance on {\em independent} stochastic communication compression operators, which leads to a reduction in the number of transmitted bits within each communication round.
EF21: A new, simpler, theoretically better, and practically faster error feedback Peter Richtarik, Professor, Computer Science Nov 1, 12:00 - 13:00 B9 R2322 H1 Error feedback (EF), also known as error compensation, is an immensely popular convergence stabilization mechanism in the context of distributed training of supervised machine learning models enhanced by the use of contractive communication compression mechanisms, such as Top-k. First proposed by Seide et al (2014) as a heuristic, EF resisted any theoretical understanding until recently [Stich et al., 2018, Alistarh et al., 2018].
Distributed second order methods with fast rates and compressed communication Peter Richtarik, Professor, Computer Science Apr 22, 12:00 - 13:00 KAUST We develop several new communication-efficient second-order methods for distributed optimization. Our first method, NEWTON-STAR, is a variant of Newton's method from which it inherits its fast local quadratic rate. However, unlike Newton's method, NEWTON-STAR enjoys the same per iteration communication cost as gradient descent. While this method is impractical as it relies on the use of certain unknown parameters characterizing the Hessian of the objective function at the optimum, it serves as the starting point which enables us to design practical variants thereof with strong theoretical guarantees. In particular, we design a stochastic sparsification strategy for learning the unknown parameters in an iterative fashion in a communication efficient manner. Applying this strategy to NEWTON-STAR leads to our next method, NEWTON-LEARN, for which we prove local linear and superlinear rates independent of the condition number. When applicable, this method can have dramatically superior convergence behavior when compared to state-of-the-art methods. Finally, we develop a globalization strategy using cubic regularization which leads to our next method, CUBIC-NEWTON-LEARN, for which we prove global sublinear and linear convergence rates, and a fast superlinear rate. Our results are supported with experimental results on real datasets, and show several orders of magnitude improvement on baseline and state-of-the-art methods in terms of communication complexity.
KAUST Research Workshop on Optimization and Big Data Peter Richtarik, Professor, Computer Science Feb 5, 08:00 - Feb 7, 05:00 B19 L3 H2 optimization machine learning Social Network Analysis asynchronous algorithms The age of "big data" is here: data of unprecedented sizes is becoming ubiquitous, which brings new challenges and new opportunities. With this comes the need to solve optimization problems of unprecedented sizes.
Engage ORCID KAUST Repository KAUST Academic Portal Scopus ShareClipboard Related Sites Computer Science (CS) Applied Mathematics and Computational Science (AMCS) Statistics (STAT) Center of Excellence for Generative AI (GenAI) Optimization and Machine Learning (OML) Related Content Articles 19 Events 9 Related Links Peter Richtarik list of Publications on Google Scholar The Machine Learning Hub AMCS YouTube Channel Peter Richtarik's Personal Website