We examine the problem of retrieving the top-m ranked items from a large collection, randomly distributed across an n-node system. In order to retrieve the top m overall, we must ...
Low-Complexity Regions (LCRs) of biological sequences are the main source of false positives in similarity searches for biological sequence databases. We consider the problem of ï...
Virtually all histograms store for each bucket the number of distinct values it contains and their average frequency. In this paper, we question this paradigm. We start out by inv...
Data about everything is readily available on the web—but often only accessible through elaborate user interactions. For automated decision support, extracting that data is esse...
Andrew Jon Sellers, Tim Furche, Georg Gottlob, Gio...
Full-text information retrieval systems have traditionally been designed for archival environments. They often provide little or no support for adding new documents to an existing...