This paper studies the influence of lexical semantic knowledge upon two related tasks: ad-hoc information retrieval and text similarity. For this purpose, we compare the performa...
The detection of image versions from large image collections is a formidable task as two images are rarely identical. Geometric variations such as cropping, rotation, and slight p...
Abstract. Internet users are becoming increasingly concerned about their personal information being collected and used by Web service providers. They want to ensure that it is stor...
Web spam research has been hampered by a lack of statistically significant collections. In this paper, we perform the first large-scale characterization of web spam using conten...
Near-duplicate detection is not only an important pre and post processing task in Information Retrieval but also an effective spam-detection technique. Among different approache...