This paper describes our participation in the GeoCLEF monolingual English task of the Cross Language Evaluation Forum 2006. The main objective of this study is to evaluate the retr...
XML is rapidly becoming one of the most widely adopted technologies for information exchange and representation. As the use of XML becomes more widespread, we foresee the developme...
Traditional information retrieval systems use query words to identify relevant documents. In difficult retrieval tasks, however, one needs access to a wealth of background knowled...
This paper describes how to automatically cross-reference documents with Wikipedia: the largest knowledge base ever known. It explains how machine learning can be used to identify...
Statistical topic models provide a general data-driven framework for automated discovery of high-level knowledge from large collections of text documents. While topic models can p...
Chaitanya Chemudugunta, Padhraic Smyth, Mark Steyv...