Web search quality can vary widely across languages, even for the same information need. We propose to exploit this variation in quality by learning a ranking function on bilingua...
Probabilistic latent semantic indexing (PLSI) represents documents of a collection as mixture proportions of latent topics, which are learned from the collection by an expectation...
Alexander Hinneburg, Hans-Henning Gabriel, Andr&eg...
The interaction history of a document can be modelled as a tree of command objects. This model does not only support recovery (undo/redo), but is also suitable for cooperation betw...
Common document clustering algorithms utilize models that either divide a corpus into smaller clusters or gather individual documents into clusters. Hierarchical Agglomerative Clus...
Abstract. Nowadays, multimedia documents composed of text and images are increasingly used, thanks to the Internet and the increasing capacity of data storage. It is more and more ...