We report on the construction of the PAN Wikipedia vandalism corpus, PAN-WVC-10, using Amazonās Mechanical Turk. The corpus compiles 32 452 edits on 28 468 Wikipedia articles, a...
In this paper, a language model adapted to graph-based representation of image content is proposed and assessed. The full indexing and retrieval processes are evaluated on two diļ...
Text clustering is an established technique for improving quality in information retrieval, for both centralized and distributed environments. However, for highly distributed envir...
The IDEX system is a prototype of an interactive dynamic Information Extraction (IE) system. A user of the system expresses an information request in the form of a topic descripti...
Parallel corpora are a valuable resource for tasks such as cross-language information retrieval and data-driven natural language processing systems. Previously only small scale cor...