Ranking is a fundamental operation in data analysis and decision support, and plays an even more crucial role if the dataset being explored exhibits uncertainty. This has led to m...
We present a system for allocating resources in shared data and compute clusters that improves MapReduce job scheduling in three ways. First, the system uses regulated and user-as...
Summarization is an important task in data mining. A major challenge over the past years has been the efficient construction of fixed-space synopses that provide a deterministic q...
Multihoming is increasingly being employed by large enterprises and data centers as a mechanism to extract good performance from their provider connections. Today, multihomed end-...
We describe set-oriented algorithms for mining association rules. Such algorithms imply performing multiple joins and may appear to be inherently less escient than special-purpose...