1;3409;0c On the Harmfulness of Redundant Batch Requests

On the Harmfulness of Redundant Batch Requests

Proceedings of the 15th IEEE International Symposium on High Performance Distributed Computing, HPDC-15, 2006
Pages: 255-266DOI: 10.1109/HPDC.2006.1652157



Most parallel computing resources are controlled by batch schedulers that place requests for computation in a queue until access to compute nodes are granted. Queue waiting times are notoriously hard to predict, making it difficult for users not only to estimate when their applications may start, but also to pick among multiple batch-scheduled resources the one that produce the shortest turnaround time. As a result, an increasing number of users resort to "redundant requests": several requests are simultaneously submitted to multiple batch schedulers on behalf of a single job; once one of these requests is granted access to compute nodes, the others are canceled. Using simulation as well as experiments with a production batch scheduler we investigate whether redundant requests are harmful in terms of (i) schedule performance and fairness, (ii) system load, and (iii) system predictability. We find that two main issues with redundant requests are load on the middleware and unfairness towards users who do not use redundant requests, which both depend on the number of users who use redundant requests and on the amount of request redundancy these users employ