ASCAR for Lustre

Introduction

High-performance parallel storage systems, such as those used by supercomputers and data centers, can suffer from performance degradation when a large number of clients are contending for limited resources, like bandwidth. These contentions lower the efficiency of the system and cause unwanted speed variances. We present the Automatic Storage Contention Alleviation and Reduction system (ASCAR), a storage traffic management system for improving the bandwidth utilization and fairness of resource allocation. ASCAR regulates I/O traffic from the clients using a rule based algorithm that controls the congestion window and request rates; it requires no runtime coordination between clients or with a central coordinator. Distributed rule-based system is fast-responding and scalable, but optimal rules are hard to design. We designed a SHAred-nothing Rule Producer (SHARP) that produces rules in an unsupervised manner by systematically exploring the solution space of possible rule designs and evaluating the target workload under the candidate rule sets. Evaluation shows that our ASCAR prototype can improve the throughput of all tested workloads – some by as much as 35%. ASCAR improves the throughput of a NASA NPB BTIO checkpoint workload by 33.5% and reduces its speed variance by 55.4% at the same time. By abandoning time-consuming communication between control clients, which is needed by most existing traffic control solutions, ASCAR achieves high responsiveness and scalability; it can efficiently handle highly dynamic workloads, such as burst I/O. The optimization time and controller overhead are unrelated to the scale of the system; thus, it has the potential to support millions of clients. As a pure client-side solution, ASCAR needs no change to either the hardware or server software.

See the SSRC ASCAR Page for more information.