—A novel method CLOSS intended for textual databases is proposed. It successfully identifies misspelled string clusters, even if the cluster border is not prominent. The method ...
Distribution data naturally arise in countless domains, such as meteorology, biology, geology, industry and economics. However, relatively little attention has been paid to data m...
Starting from an axiomatization of a generalization of Shannon entropy we introduce a set of axioms for a parametric family of distances over sets of partitions of finite sets. T...
In real-world data mining applications, an accurate ranking is same important to a accurate classification. Naive Bayes (simply NB) has been widely used in data mining as a simple...
Liangxiao Jiang, Harry Zhang, Zhihua Cai, Jiang Su
A lift curve, with the true positive rate on the y-axis and the customer pull (or contact) rate on the x-axis, is often used to depict the model performance in many data mining ap...