Multi-Scale GPU Resource Management for Deep Learning

Graduate Seminar

Event Start

2021-03-29 - 18:30

Event End

2021-03-29 - 19:30

Location

KAUST

Mosharaf Chowdhury, Morris Wellman, Assistant Professor of CSE at the University of Michigan, Ann Arbor

‌Abstract

GPUs have emerged as a popular choice for deep learning. To deal with ever-growing datasets, it is also common to use multiple GPUs in parallel for distributed deep learning. Although achieving cost-effectiveness in these clusters relies on efficient sharing, modern GPU hardware, deep learning frameworks, and cluster managers are not designed for efficient, fine-grained sharing of GPU resources. In this talk, I will present our recent works on efficient GPU resource management, both within a single GPU and across many GPUs in a cluster for hyperparameter tuning, training, and inference. The common thread across all our works is leveraging the interplay between short-term predictability and long-term unpredictability of deep learning workloads. I'll discuss how to leverage the former to address the latter and improve utilization, performance, and fairness in shared GPU clusters. All our works are open-source and available at https://github.com/symbioticlab.

Brief Biography

Mosharaf Chowdhury is a Morris Wellman assistant professor of CSE at the University of Michigan, Ann Arbor. His current research focuses on application-infrastructure symbiosis across different layers of software and hardware stacks. Mosharaf invented coflows and is a co-creator of Apache Spark. Software artifacts from his research have been deployed in Microsoft, Facebook, Google, and Amazon datacenters. He has received an NSF CAREER award, the 2015 ACM SIGCOMM doctoral dissertation award, best paper awards at NSDI and ATC, multiple faculty fellowships and awards from Google, VMware, and Alibaba, as well as a Facebook fellowship and a Cheriton Scholarship. He received his PhD from the AMPLab at UC Berkeley in 2015.

Contact Person

Ivan Viola

Related Persons

Ivan Viola

Professor, Computer Science

Professors

Event Start

Event End

Location

‌Abstract

Brief Biography

Contact Person

Related Persons

Ivan Viola

Events

CEMSE - Computer, Electrical and Mathematical Sciences and Engineering Division

Biological and Environmental Sciences Engineering Division

Physical Science and Engineering Division

Study

Expanding Knowledge

Student Affairs

Living in KAUST

About KAUST

Latest from KAUST

Computer Science Program