Excessive power consumption is becoming a major barrier to extracting the maximum performance from high-performance parallel systems. Therefore, techniques oriented towards reduci...
As parallel jobs get bigger in size and finer in granularity, “system noise” is increasingly becoming a problem. In fact, fine-grained jobs on clusters with thousands of SMP...
Dan Tsafrir, Yoav Etsion, Dror G. Feitelson, Scott...
Learning useful and predictable features from past workloads and exploiting them well is a major source of improvement in many operating system problems. We review known parallel ...
Aiming to clarify the convergence or divergence conditions for Learning Classifier System (LCS), this paper explores: (1) an extreme condition where the reinforcement process of ...
The increasing complexity of today’s systems makes fast and accurate failure detection essential for their use in mission-critical applications. Various monitoring methods provi...