Web spam can significantly deteriorate the quality of search engines. Early web spamming techniques mainly manipulate page content. Since linkage information is widely used in we...
This paper describes the participation of the INRIA group in the INEX 2007 XML entity ranking and ad hoc tracks. We developed a system for ranking Wikipedia entities in answer to a...
Anne-Marie Vercoustre, Jovan Pehcevski, James A. T...
This paper describes a new bipartite formulation for word-document co-clustering such that hyperclique patterns, strongly affiliated documents in this case, are guaranteed not to ...
Tianming Hu, Chao Qu, Chew Lim Tan, Sam Yuan Sung,...
Re-ranking the search results using PageRank is a well-known technique used in modern search engines. Running an iterative algorithm like PageRank on a large web graph consumes bo...
We have developed a web-repository crawler that is used for reconstructing websites when backups are unavailable. Our crawler retrieves web resources from the Internet Archive, Go...