Scheduling latency-sensitive applications in large-scale datacenters is challenging. Current approaches use application-layer schedulers, which impose high overheads and result in long latencies. We present Saqr, the first in-network, datacenter-wide scheduler that supports short tasks with execution times in the order of tens of microseconds. Saqr introduces new network-level constructs and a distributed scheduling policy to enable network switches to efficiently schedule tasks within the network at line rate and with minimal latency. We implemented Saqr in a testbed with high-speed programmable switches and compared its performance against the state-of-the-art in-network scheduler (Racksched). Our results show that Saqr can reduce the tail response time by up to 85% and the processing load on switches by up to 2.5X compared to Racksched. In addition, we compared Saqr versus Racksched using large-scale simulations with diverse and dynamic workloads and our results show that Saqr substantially outperforms Racksched across all performance metrics.
Copyright is held by the author(s).
This thesis may be printed or downloaded for non-commercial research and scholarly purposes.
Supervisor or Senior Supervisor
Thesis advisor: Hefeeda, Mohamed
Member of collection