We propose an unsupervised method for detecting spam documents from Web page data, based on equivalence relations on strings. We propose 3 measures for quantifying the alienness (...
With the vast amount of potential relevant documents on the Web, a key question for a retrieval system is how to achieve a high accuracy retrieval under current Web setting. The w...
The number of vertical search engines and portals has rapidly increased over the last years, making the importance of a topic-driven (focused) crawler evident. In this paper, we de...
Query ambiguity is a generally recognized problem, particularly in Web environments where queries are commonly only one or two words in length. In this study, we explore one techn...
We present a novel interpretation of Clarity [5], a widely used query performance predictor. While Clarity is commonly described as a measure of the “distance” between the lan...
Shay Hummel, Anna Shtok, Fiana Raiber, Oren Kurlan...