: We describe our participation in the TREC 2008 Enterprise track and detail our language modeling-based approaches. For document search, our focus was on query expansion using pro...
Over the past decade the Internet has evolved into the largest public community in the world. It provides a wealth of data content and services in almost every field of science, t...
In this paper, we propose the combination of the Self Organizing Map (SOM) and of the tangent distance for effective clustering in Document Image Analysis. The proposed model (SOM...
Digitized legacy document marked up with XML can be used in many ways, e.g., to generate RDF statements about the world described. A prerequisite for doing so is that the document ...
Text classification using positive and unlabeled data refers to the problem of building text classifier using positive documents (P) of one class and unlabeled documents (U) of man...