The identification and analysis of the knowledge available in document form is a key element of corporate knowledge management. In engineering-intensive organizations, it involves...
We report on the construction of the PAN Wikipedia vandalism corpus, PAN-WVC-10, using Amazonās Mechanical Turk. The corpus compiles 32 452 edits on 28 468 Wikipedia articles, a...
In this paper, a language model adapted to graph-based representation of image content is proposed and assessed. The full indexing and retrieval processes are evaluated on two diļ...
Text clustering is an established technique for improving quality in information retrieval, for both centralized and distributed environments. However, for highly distributed envir...
The IDEX system is a prototype of an interactive dynamic Information Extraction (IE) system. A user of the system expresses an information request in the form of a topic descripti...