Abstract: By means of RDFa it is possible to embed semantic meaning into standard XHTML web pages. Using the meaning, we provide content-sensitive user interfaces for web pages int...
In this paper, we try to leverage a large-scale and multilingual knowledge base, Wikipedia, to help effectively analyze and organize Web information written in different languages...
Social media such as Web forum often have dense interactions between user and content where network models are often appropriate for analysis. Joint non-negative matrix factorizat...
We present Content Extraction via Tag Ratios (CETR) – a method to extract content text from diverse webpages by using the HTML document’s tag ratios. We describe how to comput...
—Web 2.0 applications, including blogs, wikis and social networking sites, pose challenging privacy issues. Many users are unaware that search engines index personal information ...
Michael Hart, Claude Castille, Rob Johnson, Amanda...