The productivity of HPC system is determined not only by their performance, but also by their reliability. The conventional method to limit the impact of failures is checkpointing...
— We describe the design and implementation of MOCCA, a distributed CCA framework implemented using the H2O metacomputing system. Motivated by the quest for appropriate metasyste...
Maciej Malawski, Dawid Kurzyniec, Vaidy S. Sundera...
Distributed applications execute in environments that can include different network architectures as well as a range of compute platforms. Furthermore, these resources are shared ...
DataScalar architectures improve memory system performance by running computation redundantly across multiple processors, which are each tightly coupled with an associated memory....
Thispaperdescribesanewschemeforguaranteeingthattransactions in a client/server system observe consistent state while they are running. The scheme is presented in conjunction with ...