When multiple threads or processes run on a multicore CPU they compete for shared resources, such as caches and memory controllers, and can suffer performance degradation as high as 200%. We design and evaluate a new machine learning model that estimates this degradation online, on previously unseen workloads, and without perturbing the execution. Our motivation is to help data center and HPC cluster operators effectively use workload consolidation. Consolidation places many runnable entities on the same server to maximize hardware utilization, but may sacrifice performance as threads compete for resources. Our model helps determine when consolidation is overly harmful to performance. Our work is the first to apply machine learning to this problem domain, and we report on our experience reaping the advantages of machine learning while navigating around its limitations. We demonstrate how the model can be used to improve performance fidelity and save power for HPC workloads.
Copyright is held by the author.
The author granted permission for the file to be printed and for the text to be copied and pasted.
Member of collection