RDMA at Hyperscale: Experience and Future Directions

  • Wei Bai, Principal Software Research Architect, NVIDIA
-

B9 L2 H2

Remote Direct Memory Access (RDMA) has long been recognized as a powerful technology for high-performance computing and data-intensive applications. In this talk, I will present our experience in deploying intra-region RDMA to support storage workloads in Azure.

Overview

Abstract

Remote Direct Memory Access (RDMA) has long been recognized as a powerful technology for high-performance computing and data-intensive applications. In this talk, I will present our experience in deploying intra-region RDMA to support storage workloads in Azure. Today, around 70% of traffic in Azure is RDMA and intra-region RDMA is supported in all Azure public regions. I will also discuss some open problems we see going forward and our early efforts in these directions.

Brief Biography

Wei Bai recently joined NVIDIA as a principal software research architect. Before that, he was a senior researcher in the Networking Research Group at Microsoft Research Redmond. He received his PhD degree in computer science from Hong Kong University of Science and Technology. Wei is broadly interested in computer networking with a special focus on data center networking. His research work has been published in many top conferences and journals, such as SIGCOMM, NSDI, and IEEE/ACM Transactions on Networking. Wei also has rich experience in developing and operating production cloud networks. Currently, he is mainly focusing on network infrastructure to support hyperscale AI workloads.

Presenters

Wei Bai, Principal Software Research Architect, NVIDIA