Aquila: A unified, low-latency fabric for datacenter networks (2022)
Preview abstract
Datacenter workloads have evolved from the data intensive, loosely-coupled workloads of the past decade to more tightly coupled ones, wherein ultra-low latency communication is essential for resource disaggregation over the network and to enable emerging programming models.
We introduce Aquila, an experimental datacenter network fabric built with ultra-low latency support as a first-class design goal, while also supporting traditional datacenter traffic. Aquila uses a new Layer 2 cell-based protocol, GNet, an integrated switch, and a custom ASIC with low-latency Remote Memory Access (RMA) capabilities co-designed with GNet. We demonstrate that Aquila is able to achieve under 40 μs tail fabric Round Trip Time (RTT) for IP traffic and sub-10 μs RMA execution time across hundreds of host machines, even in the presence of background throughput-oriented IP traffic. This translates to more than 5x reduction in tail latency for a production quality key-value store running on a prototype Aquila network.View details
Preview abstract
Distributed caching is a key component in the design of performant, scalable Internet services, but accessing such caches
via RPC incurs high cost. Remote Memory Access (RMA)
offers a promising, less costly alternative, but achieving a rich
production feature set with RMA-based systems is a significant challenge, as the rich abstraction of RPC lends itself to
solutions for interoperability and upgradeability requirements
of real systems. This work describes CliqueMap, a fully productionized RMA/RPC hybrid serving and caching system,
and the production experience derived from three years of
operation in Google’s datacenters. Building on internal technologies, CliqueMap serves multiple internal product areas
and underlies several end-user-visible services.View details
Proceedings of the Annual Conference of the ACM Special Interest Group on Data Communication on the Applications, Technologies, Architectures, and Protocols for Computer Communication, Association for Computing Machinery, New York, NY, USA (2020), 708–721
Preview abstract
Remote Direct Memory Access (RDMA) plays a key role in supporting performance-hungry datacenter applications. However, existing RDMA technologies are ill-suited to multi-tenant datacenters, where applications run at massive scales, tenants require isolation and security, and the workload mix changes over time. Our experiences seeking to operationalize RDMA at scale indicate that these ills are rooted in standard RDMA's basic design attributes: connection-orientedness and complex policies baked into hardware.
We describe a new approach to remote memory access -- One-Shot RMA (1RMA) -- suited to the constraints imposed by our multi-tenant datacenter settings. The 1RMA NIC is connection-free and fixed-function; it treats each RMA operation independently, assisting software by offering fine-grained delay measurements and fast failure notifications. 1RMA software provides operation pacing, congestion control, failure recovery, and inter-operation ordering, when needed. The NIC, deployed in our production datacenters, supports encryption at line rate (100Gbps and 100M ops/sec) with minimal performance/availability disruption for encryption key rotation.View details
Preview abstract
It is our pleasure to introduce the 2015 Top Picks in Computer Architecture. We co-chaired the Selection Committee that had the formidable task of selecting the best computer architecture papers that were published in conferences in the previous year. Many excellent papers are published every year, and choosing among them is challenging, not least because of the need to define “best.” The committee identified 11 papers as being Top Picks this year. The range of topics is wide and reflects the healthy broadening of what the community considers to be computer architecture.View details