Active learning aims to reduce the amount of labels required for classification. The main difficulty is to find a good trade-off between exploration and exploitation of the lab...
Real-world data -- especially when generated by distributed measurement infrastructures such as sensor networks -- tends to be incomplete, imprecise, and erroneous, making it impo...
Clustering in data mining is a discovery process that groups a set of data such that the intracluster similarity is maximized and the intercluster similarity is minimized. These d...
Eui-Hong Han, George Karypis, Vipin Kumar, Bamshad...
Programming forums are becoming the primary tools for programmers to find answers for their programming problems. Our empirical study of popular programming forums shows that the...
While extensive work has been done on evaluating queries over tuple-independent probabilistic databases, query evaluation over correlated data has received much less attention eve...