Automatically clustering web pages into semantic groups promises improved search and browsing on the web. In this paper, we demonstrate how user-generated tags from largescale soc...
Daniel Ramage, Paul Heymann, Christopher D. Mannin...
A central theme of the semantic web is that programs should be able to easily aggregate data from different sources. Unfortunately, even if two sites provide their data using the ...
Heterogeneous and dirty data is abundant. It is stored under different, often opaque schemata, it represents identical real-world objects multiple times, causing duplicates, and ...
Alexander Bilke, Jens Bleiholder, Christoph Bö...
We are experiencing an unprecedented increase of content contributed by users in forums such as blogs, social networking sites and microblogging services. Such abundance of conten...
This paper presents an extensive study about the evolution of textual content on the Web, which shows how some new pages are created from scratch while others are created using al...