Distributed second order methods with fast rates and compressed communication
We develop several new communication-efficient second-order methods for distributed optimization. Our first method, NEWTON-STAR, is a variant of Newton's method from which it inherits its fast local quadratic rate. However, unlike Newton's method, NEWTON-STAR enjoys the same per iteration communication cost as gradient descent. While this method is impractical as it relies on the use of certain unknown parameters characterizing the Hessian of the objective function at the optimum, it serves as the starting point which enables us to design practical variants thereof with strong theoretical guarantees. In particular, we design a stochastic sparsification strategy for learning the unknown parameters in an iterative fashion in a communication efficient manner. Applying this strategy to NEWTON-STAR leads to our next method, NEWTON-LEARN, for which we prove local linear and superlinear rates independent of the condition number. When applicable, this method can have dramatically superior convergence behavior when compared to state-of-the-art methods. Finally, we develop a globalization strategy using cubic regularization which leads to our next method, CUBIC-NEWTON-LEARN, for which we prove global sublinear and linear convergence rates, and a fast superlinear rate. Our results are supported with experimental results on real datasets, and show several orders of magnitude improvement on baseline and state-of-the-art methods in terms of communication complexity.
Overview
Abstract
We develop several new communication-efficient second-order methods for distributed optimization. Our first method, NEWTON-STAR, is a variant of Newton's method from which it inherits its fast local quadratic rate. However, unlike Newton's method, NEWTON-STAR enjoys the same per iteration communication cost as gradient descent. While this method is impractical as it relies on the use of certain unknown parameters characterizing the Hessian of the objective function at the optimum, it serves as the starting point which enables us to design practical variants thereof with strong theoretical guarantees. In particular, we design a stochastic sparsification strategy for learning the unknown parameters in an iterative fashion in a communication-efficient manner. Applying this strategy to NEWTON-STAR leads to our next method, NEWTON-LEARN, for which we prove local linear and superlinear rates independent of the condition number. When applicable, this method can have dramatically superior convergence behavior when compared to state-of-the-art methods. Finally, we develop a globalization strategy using cubic regularization which leads to our next method, CUBIC-NEWTON-LEARN, for which we prove global sublinear and linear convergence rates, and a fast superlinear rate. Our results are supported with experimental results on real datasets, and show several orders of magnitude improvement on the baseline and state-of-the-art methods in terms of communication complexity.
Brief Biography
Peter Richtarik is a professor of Computer Science at the King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia, where he leads the Optimization and Machine Learning Lab. At KAUST, he has a courtesy affiliation with the Applied Mathematics and Computational Sciences program and the Statistics program, and is a member of the Visual Computing Center, and the Extreme Computing Research Center. Prof Richtarik is a founding member and a Fellow of the Alan Turing Institute (UK National Institute for Data Science and Artificial Intelligence), an an EPSRC Fellow in Mathematical Sciences. Prior to joining KAUST, he was an Associate Professor of Mathematics at the University of Edinburgh, and had held postdoctoral and visiting positions at Université Catholique de Louvain, Belgium, and University of California, Berkeley, USA. He received his PhD from Cornell University, USA.
Prof Richtarik’s research interests lie at the intersection of mathematics, computer science, machine learning, optimization, numerical linear algebra, and high-performance computing. Through his work on randomized and distributed optimization algorithms, he has contributed to the foundations of machine learning, optimization and randomized numerical linear algebra. He is one of the original developers of Federated Learning – a new subfield of artificial intelligence whose goal is to train machine learning models over private data stored across a large number of heterogeneous devices, such as mobile phones or hospitals, in an efficient manner, and without compromising user privacy. In an October 2020 Forbes article, and alongside self-supervised learning and transformers, Federated Learning was listed as one of three emerging areas that will shape the next generation of Artificial Intelligence technologies.
Prof Richtárik’s articles attracted international awards, including a Best Paper Award at the NeurIPS 2020Workshop on Scalability, Privacy, and Security in Federated Learning, Distinguished Speaker Award at the 2019 International Conference on Continuous Optimization, SIAM SIGEST Best Paper Award (joint with O. Fercoq), and the IMA Leslie Fox Prize (second prize, three times, awarded to two of his students and a postdoc). Several of his works are among the most read papers published by the SIAM Journal on Optimization and the SIAM Journal on Matrix Analysis and Applications. Prof Richtarik serves as an Area Chair for leading machine learning conferences, including NeurIPS, ICML and ICLR, and is an Associate Editor of the journal Optimization Methods and Software.