We investigate an inherent limitation of top-down decision tree induction in which the continuous partitioning of the instance space progressively lessens the statistical support o...
In some retrieval situations, a system must search across multiple collections. This task, referred to as federated search, occurs for example when searching a distributed index o...
Mining frequent patterns is a major topic in data mining research, resulting in many seminal papers and algorithms on item set and episode discovery. The combination of these, call...
It is well-known that naive Bayes performs surprisingly well in classification, but its probability estimation is poor. In many applications, however, a ranking based on class prob...
Stability is a common tool to verify the validity of sample based algorithms. In clustering it is widely used to tune the parameters of the algorithm, such as the number k of clust...