In this paper, we present Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext. Google is designed to crawl and index the...
People often find useful content on the web via social media. However, it is difficult to manually aggregate the information and recommendations embedded in a torrent of social ...
More and more structured information in the form of semantic data is nowadays available. It offers a wide range of new possibilities especially for semantic search and Web data in...
Since the website is one of the most important organizational structures of the Web, how to effectively rank websites has been essential to many Web applications, such as Web sear...
Duplication of Web pages greatly hurts the perceived relevance of a search engine. Existing methods for detecting duplicated Web pages can be classified into two categories, i.e. o...