Hierarchies provide a means of organizing, summarizing and accessing information. We describe a method for automatically generating hierarchies from small collections of text, and...
Hypothesis generation is a crucial initial step for making scientific discoveries. This paper addresses the problem of automatically discovering interesting hypotheses from the we...
Focused crawlers are considered as a promising way to tackle the scalability problem of topic-oriented or personalized search engines. To design a focused crawler, the choice of s...
This paper proposes a strategy to reduce the amount of hardware involved in the solution of search engine queries. It proposes using a secondary compact cache that keeps minimal i...
We profile a system for search and analysis of largescale email archives. The system builds around four facets: Content-based search engine, statistical topic model, automaticall...