MapReduce is emerging as an important programming model for large-scale data-parallel applications such as web indexing, data mining, and scientific simulation. Hadoop is an open-...
Matei Zaharia, Andy Konwinski, Anthony D. Joseph, ...
SALSA examines system logs to derive state-machine views of the sytem's execution, along with control-flow, data-flow models and related statistics. Exploiting SALSA's d...
We have developed Ceph, a distributed file system that provides excellent performance, reliability, and scalability. Ceph maximizes the separation between data and metadata manage...
Sage A. Weil, Scott A. Brandt, Ethan L. Miller, Da...
Microaggregation is one of the most employed microdata protection methods. The idea is to build clusters of at least k original records, and then replace them with the centroid of...
Subspace clustering and frequent itemset mining via “stepby-step” algorithms that search the subspace/pattern lattice in a top-down or bottom-up fashion do not scale to large ...