Part-of-speech (POS) induction is one of the most popular tasks in research on unsupervised NLP. Many different methods have been proposed, yet comparisons are difficult to make s...
Understanding how users tailor their SPARQL queries is crucial when designing query evaluation engines or fine-tuning RDF stores with performance in mind. In this paper we analyz...
The recently proposed ImageNet dataset consists of several million images, each annotated with a single object category. However, these annotations may be imperfect, in the sense t...
Evaluation forums such as TREC allow systematic measurement and comparison of information retrieval techniques. The goal is consistent improvement, based on reliable comparison of...
Timothy G. Armstrong, Alistair Moffat, William Web...
We describe a task-based evaluation to determine whether multi-document summaries measurably improve user performance when using online news browsing systems for directed research...
Kathleen McKeown, Rebecca J. Passonneau, David K. ...