Compression of term frequency lists and very long document-id lists within an inverted file search engine are examined. Several compression schemes are compared including Elias γ...
Several problems in text categorization are too hard to be solved by standard bag-of-words representations. Work in kernel-based learning has approached this problem by (i) consid...
This paper introduces the use of Wikipedia as a resource for automatic keyword extraction and word sense disambiguation, and shows how this online encyclopedia can be used to achi...
Most up-to-date well-behaved topic-based summarization systems are built upon the extractive framework. They score the sentences based on the associated features by manually assig...
Challenging the implicit reliance on document collections, this paper discusses the pros and cons of using query logs rather than document collections, as self-contained sources o...