Sciweavers

3530 search results - page 428 / 706
» Technology of Text Mining
Sort
View
WWW
2009
ACM
16 years 7 months ago
A densitometric analysis of web template content
What makes template content in the Web so special that we need to remove it? In this paper I present a large-scale aggregate analysis of textual Web content, corroborating statist...
Christian Kohlschütter
WWW
2009
ACM
16 years 7 months ago
Detecting image spam using local invariant features and pyramid match kernel
Image spam is a new obfuscating method which spammers invented to more effectively bypass conventional text based spam filters. In this paper, we extract local invariant features ...
Haiqiang Zuo, Weiming Hu, Ou Wu, Yunfei Chen, Guan...
WWW
2009
ACM
16 years 7 months ago
SOFIE: a self-organizing framework for information extraction
This paper presents SOFIE, a system for automated ontology extension. SOFIE can parse natural language documents, extract ontological facts from them and link the facts into an on...
Fabian M. Suchanek, Mauro Sozio, Gerhard Weikum
WWW
2008
ACM
16 years 7 months ago
Psst: a web-based system for tracking political statements
Determining candidates' views on important issues is critical in deciding whom to support and vote for; but finding their statements and votes on an issue can be laborious. I...
Samantha Kleinberg, Bud Mishra
WWW
2008
ACM
16 years 7 months ago
Size matters: word count as a measure of quality on wikipedia
Wikipedia, "the free encyclopedia", now contains over two million English articles, and is widely regarded as a highquality, authoritative encyclopedia. Some Wikipedia a...
Joshua E. Blumenstock