We describe Castanet, an algorithm for automatically generating hierarchical faceted metadata from textual descriptions of items, to be incorporated into browsing and navigation i...
Focussed XML component retrieval is one of the most important challenges in the XML IR field. The aim of the focussed retrieval strategy is to find the most exhaustive and specifi...
We present a term recognition approach to extract acronyms and their definitions from a large text collection. Parenthetical expressions appearing in a text collection are identif...
In this paper, we present an automated, quantitative, knowledge-poor method to evaluate the randomness of a collection of documents (corpus), with respect to a number of biased pa...
Web search engines compete to offer the fastest responses with highest relevance. However, as Web collections grow, it becomes more difficult to achieve this purpose. As most user...