Clustering is one of the most important tasks for geographic knowledge discovery. However, existing clustering methods have two severe drawbacks for this purpose. First, spatial c...
MapReduce provides a parallel and scalable programming model for data-intensive business and scientific applications. MapReduce and its de facto open source project, called Hadoop...
Of the many P2P file-sharing prototypes in existence, BitTorrent is one of the few that has managed to attract millions of users. BitTorrent relies on other (global) components f...
Johan A. Pouwelse, Pawel Garbacki, Dick H. J. Epem...
We introduce a new reliability infrastructure for file systems called I/O shepherding. I/O shepherding allows a file system developer to craft nuanced reliability policies to de...
Haryadi S. Gunawi, Vijayan Prabhakaran, Swetha Kri...
The work reported here lays the foundations of data exchange in the presence of probabilistic data. This requires rethinking the very basic concepts of traditional data exchange, ...