We present an adaptive distributed query-sampling framework that is quality-conscious for extracting high-quality text database samples. The framework divides the query-based samp...
In China-US Million Book Digital Library, output of the digitalization process is more than one terabyte of text in OEB and PDF format. To access these data quickly and accurately,...
Background: In the context of the BioCreative competition, where training data were very sparse, we investigated two complementary tasks: 1) given a Swiss-Prot triplet, containing...
Forming test collection relevance judgments from the pooled output of multiple retrieval systems has become the standard process for creating resources such as the TREC, CLEF, and...
This perspective paper explores principles of unsupervised learning and how they relate to face recognition. Dependency coding and information maximization appear to be central pr...