Similarity measures are mechanisms that assign a numeric score indicating how closely two documents, or a document and a query match. The Cosine measure is one of the similarity m...
Users prefer to navigate subjects from organized topics in an abundance resources than to list pages retrieved from search engines. We propose a framework to cluster frequent items...
Good documentation benefits every software development project, especially large ones, but it can be hard, costly, and tiresome to produce when not supported by appropriate tools...
In this paper we analyze our recent research on the use of document analysis techniques for metadata extraction from PDF papers. We describe a package that is designed to extract ...
This paper follows a formal approach to information retrieval based on statistical language models. By introducing some simple reformulations of the basic language modeling approa...