Search Sciweavers | Sciweavers

233 search results - page 22 / 47

» Clustering documents in a web directory

179

click to vote

BMCBI
2006

153views more BMCBI 2006»

Automatic document classification of biological literature

15 years 6 months ago

Download www.biomedcentral.com

Background: Document classification is a wide-spread problem with many applications, from organizing search engine snippets to spam filtering. We previously described Textpresso, ...

David Chen, Hans-Michael Müller, Paul W. Ster...

claim paper

Read More »

168

click to vote

KDD
2002
ACM

170views Data Mining» more KDD 2002»

Enhanced word clustering for hierarchical text classification

16 years 6 months ago

Download www.cs.utexas.edu

In this paper we propose a new information-theoretic divisive algorithm for word clustering applied to text classification. In previous work, such "distributional clustering&...

Inderjit S. Dhillon, Subramanyam Mallela, Rahul Ku...

claim paper

Read More »

154

click to vote

WWW
2007
ACM

152views Internet Technology» more WWW 2007»

On building graphs of documents with artificial ants

16 years 6 months ago

Download www2007.org

We present an incremental algorithm for building a neighborhood graph from a set of documents. This algorithm is based on a population of artificial agents that imitate the way re...

Hanane Azzag, Julien Lavergne, Christiane Guinot, ...

claim paper

Read More »

141

click to vote

DEXAW
2008
IEEE

123views Database» more DEXAW 2008»

Text Extraction from the Web via Text-to-Tag Ratio

16 years 14 days ago

Download www.uni-weimar.de

– We describe a method to extract content text from diverse Web pages by using the HTML document’s Text-to-Tag Ratio rather than specific HTML cues that may not be constant acr...

Tim Weninger, William H. Hsu

claim paper

Read More »

156

click to vote

SAC
2006
ACM

111views Applied Computing» more SAC 2006»

A scalable algorithm for high-quality clustering of web snippets

15 years 12 months ago

Download nmis.isti.cnr.it

We consider the problem of partitioning, in a highly accurate and highly eﬃcient way, a set of n documents lying in a metric space into k non-overlapping clusters. We augment th...

Filippo Geraci, Marco Pellegrini, Paolo Pisati, Fa...

claim paper

Read More »

« Prev « First page 22 / 47 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers