An approach to postal address detection from webpages is proposed. The webpages are first segmented into text blocks based on their visual similarity. The text content in each bl...
In this paper, we present the results of an investigation into methodologies and technical solutions for exposing the structured metadata contained within digital qualitative data...
Conventions for conducting work with groupware are essential. They include rules for how the groupware functionality should be used for communication about work, for how data shoul...
This paper considers the problem of identifying on the Web compound documents (cDocs) ? groups of web pages that in aggregate constitute semantically coherent information entities...
Automatically categorizing documents into pre-defined topic hierarchies or taxonomies is a crucial step in knowledge and content management. Standard machine learning techniques ...