Technical Report Number
For various reasons, a dedicated cluster is not always fully utilized even when all of its processors are allocated to jobs. This occurs any time that a running job does not use 100% of each of the processors allocated to it. Keeping in mind the needs of both the cluster’s system administrators and its users, we would like to increase the throughput and efficiency of the cluster while maintaining or improving the average turnaround time of the jobs and the quality of service of the “primary” jobs originally scheduled on the cluster. To increase the throughput and efficiency of the cluster, we schedule background jobs to run concurrently with the primary jobs. However, to achieve our goal of maintaining or improving the average turnaround time of the jobs and the quality of service of the primary jobs, we investigate two methods of prioritizing the CPU usage of the primary and background jobs. The first method uses the existing “nice” mechanism in the 2.4 Linux kernel to give background processes a lower priority than primary processes. The second method involves modifying the 2.4 Linux kernel’s CPU scheduler to create a new guest process priority that prevents guest processes from running when primary processes are runnable. Our results come from empirical investigations using real production applications. Production runs using these applications are regularly performed in the dedicated cluster environment that we used for testing. Measurements of various statistics, such as wall time and CPU time, are taken directly from test runs that use these same production applications. This was helpful for comparison to results from models and synthetic applications. We found that using the existing nice mechanism significantly improves the throughput, efficiency and average turnaround time of the cluster but only at the expense of the quality of service of the primary jobs (primary job running times increased 5-25%). On the other hand, we can use the guest process priority to get similar improvements in throughput, efficiency and average turnaround time while not significantly impacting the quality of service of the primary jobs (primary job running times changed less than 1%).
Stiehr, Gary, "Using Fine-Grained Cycle Stealing to Improve Throughput, Efficiency and Response Time on a Dedicated Cluster while Maintaining Quality of Service" Report Number: WUCSE-2004-52 (2004). All Computer Science and Engineering Research.