Search Sciweavers | Sciweavers

6743 search results - page 915 / 1349

» Data quality inference

167

click to vote

WWW
2007
ACM

162views Internet Technology» more WWW 2007»

Detecting near-duplicates for web crawling

16 years 7 months ago

Download infolab.stanford.edu

Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrele...

Gurmeet Singh Manku, Arvind Jain, Anish Das Sarma

claim paper

Read More »

166

click to vote

KDD
2009
ACM

182views Data Mining» more KDD 2009»

Scalable graph clustering using stochastic flows: applications to community discovery

16 years 7 months ago

Download www.cse.ohio-state.edu

Algorithms based on simulating stochastic flows are a simple and natural solution for the problem of clustering graphs, but their widespread use has been hampered by their lack of...

Venu Satuluri, Srinivasan Parthasarathy

claim paper

Read More »

210

click to vote

KDD
2007
ACM

148views Data Mining» more KDD 2007»

Scalable look-ahead linear regression trees

16 years 7 months ago

Download www.dataminingsolutions.net

Most decision tree algorithms base their splitting decisions on a piecewise constant model. Often these splitting algorithms are extrapolated to trees with non-constant models at ...

David S. Vogel, Ognian Asparouhov, Tobias Scheffer

claim paper

Read More »

158

click to vote

KDD
2006
ACM

120views Data Mining» more KDD 2006»

Hierarchical topic segmentation of websites

16 years 7 months ago

Download research.yahoo.com

In this paper, we consider the problem of identifying and segmenting topically cohesive regions in the URL tree of a large website. Each page of the website is assumed to have a t...

Ravi Kumar, Kunal Punera, Andrew Tomkins

claim paper

Read More »

213

click to vote

KDD
2005
ACM

92views Data Mining» more KDD 2005»

Summarizing itemset patterns: a profile-based approach

16 years 7 months ago

Download www.xifengyan.net

Frequent-pattern mining has been studied extensively on scalable methods for mining various kinds of patterns including itemsets, sequences, and graphs. However, the bottleneck of...

Xifeng Yan, Hong Cheng, Jiawei Han, Dong Xin

claim paper

Read More »

« Prev « First page 915 / 1349 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers