Sciweavers

6743 search results - page 915 / 1349
» Data quality inference
Sort
View
WWW
2007
ACM
16 years 7 months ago
Detecting near-duplicates for web crawling
Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrele...
Gurmeet Singh Manku, Arvind Jain, Anish Das Sarma
KDD
2009
ACM
182views Data Mining» more  KDD 2009»
16 years 7 months ago
Scalable graph clustering using stochastic flows: applications to community discovery
Algorithms based on simulating stochastic flows are a simple and natural solution for the problem of clustering graphs, but their widespread use has been hampered by their lack of...
Venu Satuluri, Srinivasan Parthasarathy
KDD
2007
ACM
148views Data Mining» more  KDD 2007»
16 years 7 months ago
Scalable look-ahead linear regression trees
Most decision tree algorithms base their splitting decisions on a piecewise constant model. Often these splitting algorithms are extrapolated to trees with non-constant models at ...
David S. Vogel, Ognian Asparouhov, Tobias Scheffer
KDD
2006
ACM
120views Data Mining» more  KDD 2006»
16 years 7 months ago
Hierarchical topic segmentation of websites
In this paper, we consider the problem of identifying and segmenting topically cohesive regions in the URL tree of a large website. Each page of the website is assumed to have a t...
Ravi Kumar, Kunal Punera, Andrew Tomkins
KDD
2005
ACM
92views Data Mining» more  KDD 2005»
16 years 7 months ago
Summarizing itemset patterns: a profile-based approach
Frequent-pattern mining has been studied extensively on scalable methods for mining various kinds of patterns including itemsets, sequences, and graphs. However, the bottleneck of...
Xifeng Yan, Hong Cheng, Jiawei Han, Dong Xin