This paper expands on a 1997 study of the amount and distribution of near-duplicate pages on the World Wide Web. We downloaded a set of 150 million web pages on a weekly basis ove...
Due to the fast growth of internet based electronic business activities, languages for modeling as well as methods for analyzing and executing distributed business processes are be...
This paper presents MetaNews, an information gathering agent for news articles on the Web. MetaNews reads HTML documents from online news sites and extracts article information fro...
In this paper we study the problem of collecting training samples for building enterprise taxonomies. We develop a computer-aided tool named InfoAnalyzer, which can effectively as...
This paper describes hybrid search, a search method supporting both document and knowledge retrieval via the flexible combination of ontologybased search and keyword-based matching...
Ravish Bhagdev, Sam Chapman, Fabio Ciravegna, Vita...