Real-time monitoring is increasingly becoming important in various scenes of large scale, multi-site distributed/parallel computing, e.g, understanding behavior of systems, schedu...
— We propose a computationally efficient method based on nonlinear optimization to identify critical lines, failure of which can cause severe blackouts. Our method computes crit...
—A Web services-based publish/subscribe system has the potential to create an Internet scale interoperable event notification system which is important for Grid computing as it e...
Yi Huang, Aleksander Slominski, Chathura Herath, D...
Designing highly dependable systems requires a good understanding of failure characteristics. Unfortunately, little raw data on failures in large IT installations is publicly avai...
Experience with generating simulation data of high energy physics experiments has shown that a job monitoring system (JMS) is essential to understand failures of jobs within the G...