Abstract— In this paper we suggest a new approach to represent text document collections, integrating background knowledge to improve clustering effectiveness. Background knowled...
The vocabulary of the TREC Legal OCR collection is noisy and huge. Standard techniques for improving retrieval performance such as content-based query expansion are ineffective fo...
Many interesting Web-based AI problems require the ability to collect, store and process large text datasets. To address this problem, we have developed Slashpack, an integrated t...
Christopher H. Brooks, Monica Agarwal, Jason Endo,...
In this paper, we study the classification problem involving information spanning multiple private databases. The privacy challenges lie in the facts that data cannot be collected...
Indexing and retrieval techniques for homology searching of genomic databases are increasingly important as the search tools are facing great challenges of rapid growth in sequence...
Simon M. C. Yuen, Fu-Lai Chung, Robert Wing Pong L...