Many large-scale production parallel programs often run for a very long time and require data checkpoint periodically to save the state of the computation for program restart and/o...
Wei-keng Liao, Kenin Coloma, Alok N. Choudhary, Le...
—Considerable work has been done on providing fault tolerance capabilities for different software components on largescale high-end computing systems. Thus far, however, these fa...
Rinku Gupta, Pete Beckman, Byung-Hoon Park, Ewing ...
Massively Multiplayer Online Games (MMOGs) currently entertain millions of players daily. To keep these players online and generate revenue, MMOGs are currently relying on manually...
Content-based publish/subscribe (pub/sub) is a promising paradigm for building asynchronous distributed applications. In many application scenarios, these systems are required to ...
We claim that network services can be transparently added to existing unmodified applications running inside virtual machine environments. Examples of these network services inclu...