Sciweavers

5213 search results - page 897 / 1043
» Statistical entity-topic models
Sort
View
KDD
2007
ACM
136views Data Mining» more  KDD 2007»
16 years 6 months ago
Information genealogy: uncovering the flow of ideas in non-hyperlinked document databases
We now have incrementally-grown databases of text documents ranging back for over a decade in areas ranging from personal email, to news-articles and conference proceedings. While...
Benyah Shaparenko, Thorsten Joachims
KDD
2006
ACM
201views Data Mining» more  KDD 2006»
16 years 6 months ago
Clustering based large margin classification: a scalable approach using SOCP formulation
This paper presents a novel Second Order Cone Programming (SOCP) formulation for large scale binary classification tasks. Assuming that the class conditional densities are mixture...
J. Saketha Nath, Chiranjib Bhattacharyya, M. Naras...
KDD
2005
ACM
125views Data Mining» more  KDD 2005»
16 years 6 months ago
Email data cleaning
Addressed in this paper is the issue of `email data cleaning' for text mining. Many text mining applications need take emails as input. Email data is usually noisy and thus i...
Jie Tang, Hang Li, Yunbo Cao, ZhaoHui Tang
KDD
2004
ACM
134views Data Mining» more  KDD 2004»
16 years 6 months ago
Exploiting a support-based upper bound of Pearson's correlation coefficient for efficiently identifying strongly correlated pair
Given a user-specified minimum correlation threshold and a market basket database with N items and T transactions, an all-strong-pairs correlation query finds all item pairs with...
Hui Xiong, Shashi Shekhar, Pang-Ning Tan, Vipin Ku...
KDD
2001
ACM
163views Data Mining» more  KDD 2001»
16 years 6 months ago
The "DGX" distribution for mining massive, skewed data
Skewed distributions appear very often in practice. Unfortunately, the traditional Zipf distribution often fails to model them well. In this paper, we propose a new probability di...
Zhiqiang Bi, Christos Faloutsos, Flip Korn