Building Latency-Critical Applications for the Future Datacenter

  • Marios Kogias, Assistant Professor, Computing Department, Imperial College, London
-

B9 L2 H2

Datacenters are the cornerstone of our digital lives since they can be viewed as just the other end of our smartphones. From an infrastructure point of view, although they started as a scale-out exercise for commodity off-the-shelf hardware, over the last years we are observing a shift from that paradigm with the emergence of increasingly fast network and storage IO devices, programmable accelerators, and new fast interconnects.

Overview

Abstract

Datacenters are the cornerstone of our digital lives since they can be viewed as just the other end of our smartphones. From an infrastructure point of view, although they started as a scale-out exercise for commodity off-the-shelf hardware, over the last years we are observing a shift from that paradigm with the emergence of increasingly fast network and storage IO devices, programmable accelerators, and new fast interconnects. These changes challenge the way existing mechanisms for operating systems, networking, and distributed systems have been designed, and push academia and industry to rethink how to build datacenter systems.

In this talk, I will focus on how the emergence of programmable network devices has led to new designs for serving microsecond-scale Remote Procedure Calls (RPCs) with tail-tolerance and fault-tolerance guarantees, and how these designs open up new research opportunities by shifting the existing performance bottlenecks. First, I will introduce R2P2, a new transport protocol for datacenter RPCs that enables in-network policy enforcement, and specifically RPC scheduling. Then, I will present HovercRaft, a novel way to seamlessly offload consensus to programmable switches, and show how HovercRaft shifts the State Machine Replication (SMR) bottlenecks from the consensus layer to execution. Finally, I will describe how Dorad tackles this last challenge by allowing for the deterministic parallel execution of latency-critical and microsecond-scale RPCs.

Brief Biography

Marios Kogias is an Assistant Professor at Imperial College London. His research focuses on datacenter systems and networking. Before Imperial, he was a researcher in the Confidential Computing Group at Azure Research in Cambridge, UK. His PhD thesis received the Dennis M. Ritchie Doctoral Dissertation Award, the honourable mention for the Roger Needham PhD award, and the ABB Dissertation Award. He was an IBM PhD Fellow and won the best student paper award at Eurosys 2020. He has also held positions at Microsoft Research Redmond, Google, and CERN.

Presenters

Marios Kogias, Assistant Professor, Computing Department, Imperial College, London