Sciweavers

2840 search results - page 347 / 568
» Learning Phrasal Categories
Sort
View
KDD
2008
ACM
183views Data Mining» more  KDD 2008»
16 years 7 months ago
De-duping URLs via rewrite rules
A large fraction of the URLs on the web contain duplicate (or near-duplicate) content. De-duping URLs is an extremely important problem for search engines, since all the principal...
Anirban Dasgupta, Ravi Kumar, Amit Sasturkar
KDD
2007
ACM
159views Data Mining» more  KDD 2007»
16 years 7 months ago
Local decomposition for rare class analysis
Given its importance, the problem of predicting rare classes in large-scale multi-labeled data sets has attracted great attentions in the literature. However, the rare-class probl...
Junjie Wu, Hui Xiong, Peng Wu, Jian Chen
KDD
2007
ACM
154views Data Mining» more  KDD 2007»
16 years 7 months ago
Canonicalization of database records using adaptive similarity measures
It is becoming increasingly common to construct databases from information automatically culled from many heterogeneous sources. For example, a research publication database can b...
Aron Culotta, Michael L. Wick, Robert Hall, Matthe...
KDD
2006
ACM
179views Data Mining» more  KDD 2006»
16 years 7 months ago
Extracting key-substring-group features for text classification
In many text classification applications, it is appealing to take every document as a string of characters rather than a bag of words. Previous research studies in this area mostl...
Dell Zhang, Wee Sun Lee
KDD
2006
ACM
115views Data Mining» more  KDD 2006»
16 years 7 months ago
Supervised probabilistic principal component analysis
Principal component analysis (PCA) has been extensively applied in data mining, pattern recognition and information retrieval for unsupervised dimensionality reduction. When label...
Shipeng Yu, Kai Yu, Volker Tresp, Hans-Peter Krieg...