The proliferation of digital libraries and the large amount of existing documents raise important issues in efficient handling of documents. Printed texts in documents need to be...
Abstract. In this paper we describe a methodology for harvesting information from large distributed repositories (e.g. large Web sites) with minimum user intervention. The methodol...
Fabio Ciravegna, Sam Chapman, Alexiei Dingli, Yori...
With an increasing amount of semi-structured data XML has become important. XML documents may contain private information that cannot be shared by all user communities. Therefore,...
We present an approach to the discovery of semantically similar terms that utilizes a web search engine as both a source for generating related terms and a tool for estimating the...
The field of information retrieval still strives to develop models which allow semantic information to be integrated in the ranking process to improve performance in comparison to...