We are experiencing a new Social Web, where people share, communicate, commiserate, and conflict with each other. As evidenced by systems like Wikipedia, twitter, and delicious.co...
Recent work on incremental crawling has enabled the indexed document collection of a search engine to be more synchronized with the changing World Wide Web. However, this synchron...
Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey...
We propose an unsupervised method for detecting spam documents from Web page data, based on equivalence relations on strings. We propose 3 measures for quantifying the alienness (...
Abstract. This paper introduces our method for causal knowledge retrieval from the Internet resources, its results and evaluation of using it in utterance creation process. Our sys...
Web spam pages use various techniques to achieve higher-than-deserved rankings in a search engine’s results. While human experts can identify spam, it is too expensive to manual...