Accelerated Deep Learning via Efficient, Compressed and Managed Communication

KAUST CEMSE KAUST Conference on Artificial Intelligence Canini jpg

Research Conference

Event Start

2021-04-28 - 15:00

Event End

2021-04-28 - 15:20

Location

KAUST

Marco Canini

Associate Professor, Computer Science

Abstract

Scaling deep learning to a large cluster of workers is challenging due to high communication overheads that data-parallelism entails. This talk describes our efforts to rein in distributed deep learning's communication bottlenecks. We describe SwitchML, the state-of-the-art in-network aggregation system for collective communication using programmable network switches. We introduce OmniReduce, an efficient streaming aggregation system that exploits sparsity to maximize effective bandwidth use. We touch on our work to develop compressed gradient communication algorithms that perform efficiently and adapt to network conditions. Lastly, we take a broad look at the challenges to accelerated decentralized training in the federated learning setting where heterogeneity is an intrinsic property of the environment.

Brief Biography

Marco does not know what the next big thing will be. But he's sure that our next-gen computing and networking infrastructure must be a viable platform for it and avoid stifling innovation. Marco's research spans a number of areas in computer systems, including distributed systems, large-scale/cloud computing and computer networking with emphasis on programmable networks. His current focus is on designing better systems support for AI/ML and providing practical implementations deployable in the real-world.

Marco is an associate professor in Computer Science at KAUST. Marco obtained his Ph.D. in computer science and engineering from the University of Genoa in 2009 after spending the last year as a visiting student at the University of Cambridge. He was a postdoctoral researcher at EPFL and a senior research scientist at Deutsche Telekom Innovation Labs & TU Berlin. Before joining KAUST, he was an assistant professor at UCLouvain. He also held positions at Intel, Microsoft and Google.

Contact Person

Bernard Ghanem

Related Persons

Marco Canini

Associate Professor, Computer Science

Professors

Event Start

Event End

Location

Abstract

Brief Biography

Contact Person

Related Persons

Marco Canini

Events

Pontryagin meets Bellman: on combining Pontryagin’s Principle and Dynamic Programming

CEMSE - Computer, Electrical and Mathematical Sciences and Engineering Division

Biological and Environmental Sciences Engineering Division

Physical Science and Engineering Division

Study

Expanding Knowledge

Student Affairs

Living in KAUST

About KAUST

Latest from KAUST

Artificial Intelligence Initiative

Follow us