The Scamseek project, as commissioned by ASIC has the principal objective of building an industrially viable system that retrieves potential scam candidate documents from the Inte...
Over the past two decades a significant number of layout analysis (page segmentation and region classification) approaches have been proposed in the literature. Each approach has b...
In automated text categorization, given a small number of labeled documents, it is very challenging, if not impossible, to build a reliable classifier that is able to achieve high...
Zenglin Xu, Rong Jin, Kaizhu Huang, Michael R. Lyu...
We study the expressiveness of a positive fragment of path queries, denoted Path+ , on node-labeled trees documents. The expressiveness of Path+ is studied from two angles. First, ...
Yuqing Wu, Dirk Van Gucht, Marc Gyssens, Jan Pared...
A good clustering performance depends on the quality of the distance function used to asses similarity. In this paper we propose a pairwise document coreference model to improve pe...
Iustin Dornescu, Constantin Orasan, Tatiana Lesnik...