Although documents have hundreds of thousands of unique words, only a small number of words are significantly useful for intelligent services. For this reason, feature extraction ...
A great number of documents are scanned and archived in the form of digital images in digital libraries, to make them available and accessible in the Internet. Information retriev...
Abstract. In this paper we describe a methodology for harvesting information from large distributed repositories (e.g. large Web sites) with minimum user intervention. The methodol...
Fabio Ciravegna, Sam Chapman, Alexiei Dingli, Yori...
Abstract. This paper summarises our work in textual Case-Based Reasoning within jCOLIBRI. We use Information Extraction techniques to annotate web pages to facilitate semantic retr...
The structural features of XML components are an extra source of information that should be used in a contentoriented retrieval task on this type of documents. This paper explores...