Abstract. Parallel corpora are a valuable resource for machine translation, but at present their availability and utility is limited by genreand domain-speci city, licensing restri...
Abstract. Modern document collections often contain groups of documents with overlapping or shared content. However, most information retrieval systems process each document separa...
Andrei Z. Broder, Nadav Eiron, Marcus Fontoura, Mi...
A new method for augmenting paper documents with electronic information is described that does not modify the format of the paper document in any way. Applicable to both commercia...
Jonathan J. Hull, Berna Erol, Jamey Graham, Qifa K...
We present the STEX system, a semantic extension of LATEX, that allows for producing high-quality PDF documents for (proof)reading and printing, as well as semantic XML/OMDoc docu...
Andrea Kohlhase, Michael Kohlhase, Christoph Lange...
We propose a new method to select relevant images to the given keywords from images gathered from the Web based on the Probabilistic Latent Semantic Analysis (PLSA) model which is...