The increasing popularity of cloud storage is leading organizations to consider moving data out of their own data centers and into the cloud. However, success for cloud storage pr...
Hussam Abu-Libdeh, Lonnie Princehouse, Hakim Weath...
This work addresses the need for stateful dataflow programs that can rapidly sift through huge, evolving data sets. These data-intensive applications perform complex multi-step c...
Dionysios Logothetis, Christopher Olston, Benjamin...
Batched stream processing is a new distributed data processing paradigm that models recurring batch computations on incrementally bulk-appended data streams. The model is inspired...
Bingsheng He, Mao Yang, Zhenyu Guo, Rishan Chen, B...
Recently there has been significant interest in employing probabilistic techniques for fault localization. Using dynamic dependence information for multiple passing runs, learnin...
We give the first optimal algorithm for estimating the number of distinct elements in a data stream, closing a long line of theoretical research on this problem begun by Flajolet...