Empirical performance evaluation is the process of measuring and calculating performance metrics of deployed software systems. It is a part of performance validation during testin...
Translation systems are generally trained to optimize BLEU, but many alternative metrics are available. We explore how optimizing toward various automatic evaluation metrics (BLEU...
Daniel Cer, Christopher D. Manning, Daniel Jurafsk...
The systems and networking community treasures "simple" system designs, but our evaluation of system simplicity often relies more on intuition and qualitative discussion...
We illustrate and explain problems of n-grams-based machine translation (MT) metrics (e.g. BLEU) when applied to morphologically rich languages such as Czech. A novel metric SemPO...
We propose two metrics to demonstrate the impact integrating human-computer interaction (HCI) activities in software engineering (SE) processes. User experience metric (UXM) is a ...