Duplicate URLs have brought serious troubles to the whole pipeline of a search engine, from crawling, indexing, to result serving. URL normalization is to transform duplicate URLs...
Tao Lei, Rui Cai, Jiang-Ming Yang, Yan Ke, Xiaodon...
Big data presents new challenges to both cluster infrastructure software and parallel application design. We present a set of software services and design principles for data inte...
Yogesh Simmhan, Roger S. Barga, Catharine van Inge...
With the recent advancements and wide usage of location detection devices, large quantities of data are collected by GPS and cellular technologies in the form of trajectories. Whi...
Marcos R. Vieira, Petko Bakalov, Vassilis J. Tsotr...
In this paper, a mean shift-based clustering algorithm is proposed. The mean shift is a kernel-type weighted mean procedure. Herein, we first discuss three classes of Gaussian, C...
Many computer vision and pattern recognition algorithms are very sensitive to the choice of an appropriate distance metric. Some recent research sought to address a variant of the...