The general approach to fault tolerance in uniprocessor systems is to maintain enough time redundancy in the schedule so that any task instance can be re-executed in presence of f...
We consider the issue of deadline tardiness under global multiprocessor scheduling algorithms. We present a general tardiness-bound derivation that is applicable to a wide variety...
The ability to reserve network bandwidth is critical for the success of high-performance grid applications. While reservation of lightpaths in dynamically switched optical network...
Multi-cluster schedulers can dramatically improve average job turn-around time performance by making use of fragmented node resources available throughout the grid. By carefully m...
In multicluster systems, and more generally, in grids, jobs may require co-allocation, i.e., the simultaneous allocation of resources such as processors and input files in multip...