What makes template content in the Web so special that we need to remove it? In this paper I present a large-scale aggregate analysis of textual Web content, corroborating statist...
Image spam is a new obfuscating method which spammers invented to more effectively bypass conventional text based spam filters. In this paper, we extract local invariant features ...
Haiqiang Zuo, Weiming Hu, Ou Wu, Yunfei Chen, Guan...
This paper presents SOFIE, a system for automated ontology extension. SOFIE can parse natural language documents, extract ontological facts from them and link the facts into an on...
Determining candidates' views on important issues is critical in deciding whom to support and vote for; but finding their statements and votes on an issue can be laborious. I...
Wikipedia, "the free encyclopedia", now contains over two million English articles, and is widely regarded as a highquality, authoritative encyclopedia. Some Wikipedia a...