We illustrate that Web searches can often be utilized to generate background text for use with text classification. This is the case because there are frequently many pages on the...
Active data clustering is a novel technique for clustering of proximity data which utilizes principles from sequential experiment design in order to interleave data generation and...
In this paper we introduce a programming language for Web document processing called WebL. WebL is a high level, object-oriented scripting language that incorporates two novel fea...
— One of the critical issues in search engines is the size of search indexes: as the number of documents handled by an engine increases, the search must preserve its efficiency,...
Background: Text mining has spurred huge interest in the domain of biology. The goal of the BioCreAtIvE exercise was to evaluate the performance of current text mining systems. We...