To achieve high reliability and scalability, most large-scale data warehouse systems have adopted the cluster-based architecture. In this paper, we propose the design of a new clu...
Yuting Lin, Divyakant Agrawal, Chun Chen, Beng Chi...
Over the last decade the cost of producing genomic sequences has dropped dramatically due to the current so called “next-gen” sequencing methods. However, these next-gen seque...
We present a hybrid method to turn off-the-shelf information retrieval (IR) systems into future event predictors. Given a query, a time series model is trained on the publication...
Most of the existing reachability indices perform well on small- to medium- size graphs, but reach a scalability bottleneck around one million vertices/edges. As graphs become inc...
We present BloomUnit, a testing framework for distributed programs written in the Bloom language. BloomUnit allows developers to write declarative test specifications that descri...
Peter Alvaro, Andrew Hutchinson, Neil Conway, Will...