To improve data availability and resilience MapReduce frameworks use file systems that replicate data uniformly. However, analysis of job logs from a large production cluster show...
We present a divide-and-merge methodology for clustering a set of objects that combines a top-down "divide" phase with a bottom-up "merge" phase. In contrast, ...
David Cheng, Santosh Vempala, Ravi Kannan, Grant W...
Directed diffusion is a prominent example of data-centric routing based on application layer data and purely local interactions. In its functioning it relies heavily on network-wid...
Process Mining refers to the extraction of process models from event logs. Real-life processes tend to be less structured and more flexible. Traditional process mining algorithms...
R. P. Jagadeesh Chandra Bose, Wil M. P. van der Aa...
We consider the problem of partitioning, in a highly accurate and highly efficient way, a set of n documents lying in a metric space into k non-overlapping clusters. We augment th...
Filippo Geraci, Marco Pellegrini, Paolo Pisati, Fa...