Building Better Datacenters - The Quest for Low Latency

  • Simon Peter, Assistant professor, Computer Science, University of Texas, Austin
-

KAUST

In this talk, I focus on the adoption of low latency persistent memory modules (PMMs). PMMs upend the long-established model of remote storage for distributed file systems. Instead, by colocating computation with PMM storage we can provide applications with much higher IO performance, sub-second application failover, and strong consistency. To demonstrate this, I present Assise, a new distributed file system, based on a persistent, replicated coherence protocol that manages client-local PMM as a linearizable and crash-recoverable cache between applications and slower (and possibly remote) storage.

Overview

Abstract

As datacenter applications grow in number and complexity, datacenter-internal service latency requirements have dropped to the microsecond range. Providing these service latencies at increasing datacenter utilization is difficult. Operating system functionality on the service critical path often incurs overhead in the millisecond range and induces even longer queueing delay as utilization rises and during component fail-over. My research aims to dramatically lower datacenter service latencies by redesigning hardware and operating system functionality to remove these overheads from the critical path.

In this talk, I focus on the adoption of low latency persistent memory modules (PMMs). PMMs upend the long-established model of remote storage for distributed file systems. Instead, by colocating computation with PMM storage we can provide applications with much higher IO performance, sub-second application failover, and strong consistency. To demonstrate this, I present Assise, a new distributed file system, based on a persistent, replicated coherence protocol that manages client-local PMM as a linearizable and crash-recoverable cache between applications and slower (and possibly remote) storage. Assise maximizes locality for all file IO by carrying out IO on process-local, socket-local, and client-local PMM whenever possible. Assise minimizes coherence overhead by maintaining consistency at IO operation granularity, rather than at fixed block sizes. Assise improves IO latency, throughput, and fail over time by an order of magnitude versus the state-of-the-art, while providing stronger consistency semantics. I finish with an overview of further research in this space. PDF

Brief Biography

Simon is an assistant professor in computer science at The University of Texas at Austin. Simon works to dramatically improve data center efficiency and reliability by designing, building, and evaluating new alternatives for their hardware and software components. Simon currently co-designs networking and storage stacks with new hardware technologies to reduce service latencies by orders of magnitude beyond today's capabilities.

Simon is the director of the Texas Systems Research Consortium, where he collaborates closely with industry to shape the future of cloud computing. Simon's work is supported by VMware, Microsoft Research, Huawei, Google, Citadel Securities, and Arm. Simon received the SIGOPS Hall of Fame award in 2020. He was twice awarded the Jay Lepreau Best Paper Award, in 2014 and 2016, an IEEE Micro Top Pick Honorable Mention in 2021, and a Memorable Paper Award in 2018. He received an NSF CAREER Award and he is a Sloan research fellow. Before joining UT Austin in 2016, Simon was a research associate at the University of Washington from 2012-2016. He received a Ph.D. in Computer Science from ETH Zurich in 2012.

Presenters

Simon Peter, Assistant professor, Computer Science, University of Texas, Austin