We consider the problem of dust: Different URLs with Similar Text. Such duplicate URLs are prevalent in web sites, as web server software often uses aliases and redirections, and...
The Semantic Web is an extension of the current Web in which information is given well-defined meaning to support effective data discovery and integration. The RDF framework is a...
This research is about automatic identification and extraction of person names in Chinese text documents. Solutions to this problem have immediate and extensive applications in ma...
Evaluation of IR systems has always been difficult because of the need for manually assessed relevance judgments. The advent of large editor-driven taxonomies on the web opens the...
Steven M. Beitzel, Eric C. Jensen, Abdur Chowdhury...
Rapid increase in the number of pages on web sites, and widespread use of search engine optimization techniques, lead to web sites becoming difficult to navigate. Traditional site ...