The Minimum-Entropy Clustering (MEC) algorithm proposed in this paper provides an optimal method for addressing the non-stationarity of a source with respect to entropy coding. Th...
In this paper, we introduce the concept of a QA-Pagelet to refer to the content region in a dynamic page that contains query matches. We present THOR, a scalable and efficient min...
Part of the process of data integration is determining which sets of identifiers refer to the same real-world entities. In integrating databases found on the Web or obtained by us...
Most research on Internet topology is based on active measurement methods. A major difficulty in using these tools is that one comes across many unresponsive routers. Different m...
This paper presents a new algorithm named Kernel Bisecting k-means and Sample Removal (KBK-SR) as a sampling preprocessing for SVM training to improve the scalability. The novel c...