: If multiple evaluators analyse the outcomes of a single user test, the agreement between their lists of identified usability problems tends to be limited. This is called the ‘e...
Arnold P. O. S. Vermeeren, Ilse van Kesteren, Math...
We present the methodology that underlies new metrics for semantic machine translation evaluation that we are developing. Unlike widely-used lexical and n-gram based MT evaluation...
The need for automated text evaluation is common to several AI disciplines. In this work, we explore the use of Machine Translation (MT) evaluation metrics for Textual Case Based R...
Ibrahim Adeyanju, Nirmalie Wiratunga, Robert Lothi...
Abstract—Automatic summarization evaluation is very important to the development of summarization systems. In text summarization, ROUGE has been shown to correlate well with huma...
Accurate estimation of information retrieval evaluation metrics such as average precision require large sets of relevance judgments. Building sets large enough for evaluation of r...