Better Methods and Theory for Federated Learning: Compression, Client Selection, and Heterogeneity

Event Start
Event End
Building 5, Level 5, Room 5209


Federated learning (FL) is an emerging machine learning paradigm involving multiple clients, e.g., mobile phone devices, with an incentive to collaborate in solving a machine learning problem coordinated by a central server. FL was proposed in 2016 by Konecny et al. and McMahan et al. as a viable privacy-preserving alternative to traditional centralized machine learning since, by construction, the training data points are decentralized and never transferred by the clients to a central server. Therefore, to a certain degree, FL mitigates the privacy risks associated with centralized data collection.

Unfortunately, optimization for FL faces several specific issues that centralized optimization usually does not need to handle. In this thesis, we identify several of these challenges and propose new methods and algorithms to address them, with the ultimate goal of enabling practical FL solutions supported with mathematically rigorous guarantees. In particular, in the first four chapters, we focus on the communication bottleneck and devise novel compression mechanisms and tools that can provably accelerate the training process. In the fifth chapter, we address another significant challenge of FL: partial participation of clients in each round of the training process. More concretely, we propose the first important client sampling strategy that is compatible with two core privacy requirements of FL: secure aggregation and statelessness of clients. The sixth chapter is dedicated to another challenge in the cross-device FL setting—system heterogeneity, i.e., the diversity in clients’ processing capabilities and network bandwidth, and the communication overhead caused by slow connections. To tackle this, we introduce the ordered dropout (OD) mechanism. OD promotes an ordered, nested representation of knowledge in neural networks and enables the extraction of lower-footprint sub-models without retraining, which offers fair and accurate learning in this challenging FL setting. Lastly, in the seventh chapter, we study several key algorithmic ingredients behind some of the most popular methods for cross-device FL aimed to tackle heterogeneity and communication bottlenecks. In particular, we propose a general framework for analyzing methods employing all these techniques simultaneously, which helps us better understand their combined effect. Our approach identifies several inconsistencies and enables better utilization of these components, including the popular practice of running multiple local training steps before aggregation.

Brief Biography

Samuel Horváth is a Ph.D. candidate at Visual Computing Center (VCC) in King Abdullah University of Science and Technology (KAUST), studying under the supervision of Professor Peter Richtarik in his research group. During his Ph.D., he interned at Amazon hosted by Cedric Archambeau, at Samsung hosted by Stefanos Laskaridis, Mario Almeida, and Nicolas Lane, and at Meta hosted by Michael Rabbat. His research interest is mainly in Non-convex and Convex Optimization for Machine Learning, in particular Federated Learning. Before that, he was an undergrad student in Financial Mathematics at Comenius University. He received the Best Poster Award at Data Science Summer School 2018 at Ecole Polytechnique, and he is a recipient of the Best Paper Award at Federated Learning Workshop NeurIPS 2020.

Contact Person