Peter Richtarik

About Peter Richtarik

Peter Richtarik

Professor, Computer Science

optimization machine learning big data algorithms

Professor Richtárik specializes in big data optimization and machine learning. He is known for his work on randomized coordinate descent algorithms, stochastic gradient descent and federated learning.

Events

Presented Events

Sep 14 - Sep 20, 2025

From the Ball-Proximal (Broximal) Point Method to Efficient Training of LLMs

Peter Richtarik, Professor, Computer Science

Sep 16, 16:00 - 17:00

B1 L3 R3119

AI machine learning optimization algorithms LLM

This talk introduces the Ball-Proximal Point Method, a new foundational algorithm for non-smooth optimization with surprisingly fast convergence, and Gluon, a new theoretical framework that closes the gap between theory and practice for modern LMO-based deep learning optimizers.
From the Ball-Proximal (Broximal) Point Method to Efficient Training of LLMs

Peter Richtarik, Professor, Computer Science

Sep 15, 12:00 - 13:00

B9 L2 R2325

AI machine learning optimization algorithms LLM

This talk introduces the Ball-Proximal Point Method, a new foundational algorithm for non-smooth optimization with surprisingly fast convergence, and Gluon, a new theoretical framework that closes the gap between theory and practice for modern LMO-based deep learning optimizers.

Aug 31 - Sep 6, 2025

From the Ball-Proximal (Broximal) Point Method to Efficient Training of LLMs

Peter Richtarik, Professor, Computer Science

Sep 4, 12:00 - 13:00

B9 L2 R2325

AI machine learning optimization algorithms LLM

This talk introduces the Ball-Proximal Point Method, a new foundational algorithm for non-smooth optimization with surprisingly fast convergence, and Gluon, a new theoretical framework that closes the gap between theory and practice for modern LMO-based deep learning optimizers.

Sep 11 - Sep 17, 2022

On the resolution of a theoretical question related to the nature of local training in federated learning

Peter Richtarik, Professor, Computer Science

Sep 13, 15:30 - 17:00

B1 L3 R3119

machine learning mathematical optimization communications algorithms

In this talk, I will explain the problem, its solution, and some subsequent work generalizing, extending and improving the ProxSkip method in various ways. We study distributed optimization methods based on the local training (LT) paradigm - achieving improved communication efficiency by performing richer local gradient-based training on the clients before parameter averaging - which is of key importance in federated learning. Looking back at the progress of the field in the last decade, we identify 5 generations of LT methods: 1) heuristic, 2) homogeneous, 3) sublinear, 4) linear, and 5) accelerated. The 5th generation, initiated by the ProxSkip method of Mishchenko et al (2022) and its analysis, is characterized by the first theoretical confirmation that LT is a communication acceleration mechanism.

Mar 27 - Apr 2, 2022

ProxSkip: Yes! Local Gradient Steps Provably Lead to Communication Acceleration! Finally!

Peter Richtarik, Professor, Computer Science

Mar 31, 12:00 - 13:00

B9 L2 H2

We introduce ProxSkip a surprisingly simple and provably efficient method for minimizing the sum of a smooth (ƒ) and an expensive nonsmooth proximable (ψ) function.

Mar 20 - Mar 26, 2022

Permutation compressors for provably faster distributed nonconvex optimization

Peter Richtarik, Professor, Computer Science

Mar 21, 12:00 - 13:00

B9 R2322 H1

We study the MARINA method of Gorbunov et al (ICML 2021) - the current state-of-the-art distributed non-convex optimization method in terms of theoretical communication complexity. Theoretical superiority of this method can be largely attributed to two sources: the use of a carefully engineered biased stochastic gradient estimator, which leads to a reduction in the number of communication rounds, and the reliance on {\em independent} stochastic communication compression operators, which leads to a reduction in the number of transmitted bits within each communication round.

Oct 31 - Nov 6, 2021

EF21: A new, simpler, theoretically better, and practically faster error feedback

Peter Richtarik, Professor, Computer Science

Nov 1, 12:00 - 13:00

B9 R2322 H1

Error feedback (EF), also known as error compensation, is an immensely popular convergence stabilization mechanism in the context of distributed training of supervised machine learning models enhanced by the use of contractive communication compression mechanisms, such as Top-k. First proposed by Seide et al (2014) as a heuristic, EF resisted any theoretical understanding until recently [Stich et al., 2018, Alistarh et al., 2018].

Apr 18 - Apr 24, 2021

Distributed second order methods with fast rates and compressed communication

Peter Richtarik, Professor, Computer Science

Apr 22, 12:00 - 13:00

KAUST

We develop several new communication-efficient second-order methods for distributed optimization. Our first method, NEWTON-STAR, is a variant of Newton's method from which it inherits its fast local quadratic rate. However, unlike Newton's method, NEWTON-STAR enjoys the same per iteration communication cost as gradient descent. While this method is impractical as it relies on the use of certain unknown parameters characterizing the Hessian of the objective function at the optimum, it serves as the starting point which enables us to design practical variants thereof with strong theoretical guarantees. In particular, we design a stochastic sparsification strategy for learning the unknown parameters in an iterative fashion in a communication efficient manner. Applying this strategy to NEWTON-STAR leads to our next method, NEWTON-LEARN, for which we prove local linear and superlinear rates independent of the condition number. When applicable, this method can have dramatically superior convergence behavior when compared to state-of-the-art methods. Finally, we develop a globalization strategy using cubic regularization which leads to our next method, CUBIC-NEWTON-LEARN, for which we prove global sublinear and linear convergence rates, and a fast superlinear rate. Our results are supported with experimental results on real datasets, and show several orders of magnitude improvement on baseline and state-of-the-art methods in terms of communication complexity.

Feb 4 - Feb 10, 2018

KAUST Research Workshop on Optimization and Big Data

Peter Richtarik, Professor, Computer Science

Feb 5, 08:00 - Feb 7, 05:00

B19 L3 H2

optimization machine learning Social Network Analysis asynchronous algorithms

The age of "big data" is here: data of unprecedented sizes is becoming ubiquitous, which brings new challenges and new opportunities. With this comes the need to solve optimization problems of unprecedented sizes.