Sciweavers

233 search results - page 22 / 47
» Clustering documents in a web directory
Sort
View
BMCBI
2006
153views more  BMCBI 2006»
15 years 6 months ago
Automatic document classification of biological literature
Background: Document classification is a wide-spread problem with many applications, from organizing search engine snippets to spam filtering. We previously described Textpresso, ...
David Chen, Hans-Michael Müller, Paul W. Ster...
KDD
2002
ACM
170views Data Mining» more  KDD 2002»
16 years 6 months ago
Enhanced word clustering for hierarchical text classification
In this paper we propose a new information-theoretic divisive algorithm for word clustering applied to text classification. In previous work, such "distributional clustering&...
Inderjit S. Dhillon, Subramanyam Mallela, Rahul Ku...
WWW
2007
ACM
16 years 6 months ago
On building graphs of documents with artificial ants
We present an incremental algorithm for building a neighborhood graph from a set of documents. This algorithm is based on a population of artificial agents that imitate the way re...
Hanane Azzag, Julien Lavergne, Christiane Guinot, ...
DEXAW
2008
IEEE
123views Database» more  DEXAW 2008»
16 years 14 days ago
Text Extraction from the Web via Text-to-Tag Ratio
– We describe a method to extract content text from diverse Web pages by using the HTML document’s Text-to-Tag Ratio rather than specific HTML cues that may not be constant acr...
Tim Weninger, William H. Hsu
SAC
2006
ACM
15 years 12 months ago
A scalable algorithm for high-quality clustering of web snippets
We consider the problem of partitioning, in a highly accurate and highly efficient way, a set of n documents lying in a metric space into k non-overlapping clusters. We augment th...
Filippo Geraci, Marco Pellegrini, Paolo Pisati, Fa...