Duplicate URLs have brought serious troubles to the whole pipeline of a search engine, from crawling, indexing, to result serving. URL normalization is to transform duplicate URLs...
Tao Lei, Rui Cai, Jiang-Ming Yang, Yan Ke, Xiaodon...
Abstract. The similarity join has become an important database primitive to support similarity search and data mining. A similarity join combines two sets of complex objects such t...
Many high-performance isosurface extraction algorithms have been proposed in the past several years as a result of intensive research efforts. When applying these algorithms to la...
A new statistical method called "bilingual chunking" for structure alignment is proposed. Different with the existing approaches which align hierarchical structures like...
Wei Wang, Ming Zhou, Jin-Xia Huang, Changning Huan...
Co-occurrence data is quite common in many real applications. Latent Semantic Analysis (LSA) has been successfully used to identify semantic relations in such data. However, LSA c...