User generated content is characterized by short, noisy documents, with many spelling errors and unexpected language usage. To bridge the vocabulary gap between the user's in...
Wouter Weerkamp, Krisztian Balog, Maarten de Rijke
Abstract. Estimating the degree of similarity between images is a challenging task as the similarity always depends on the context. Because of this context dependency, it seems qui...
This paper develops a general, formal framework for modeling term dependencies via Markov random fields. The model allows for arbitrary text features to be incorporated as eviden...
We present and evaluate various content-based recommendation models that make use of user and item profiles defined in terms of weighted lists of social tags. The studied approach...
Information retrieval algorithms leverage various collection statistics to improve performance. Because these statistics are often computed on a relatively small evaluation corpus...