Most studies in statistical or machine learning based authorship attribution focus on two or a few authors. This leads to an overestimation of the importance of the features extra...
Annotated corpora are only useful if their annotations are consistent. Most large-scale annotation efforts take special measures to reconcile inter-annotator disagreement. To date...
Given a classifier trained on relatively few training examples, active learning (AL) consists in ranking a set of unlabeled examples in terms of how informative they would be, if ...
Andrea Esuli, Diego Marcheggiani, Fabrizio Sebasti...
In this paper we exploit Semantic Vectors to develop an IR system. The idea is to use semantic spaces built on terms and documents to overcome the problem of word ambiguity. Word ...
Pierpaolo Basile, Annalina Caputo, Giovanni Semera...
Previous content extraction evaluations have neglected to address problems which complicate the incorporation of extracted information into an existing knowledge base. Previous qu...
Paul McNamee, Hoa Trang Dang, Heather Simpson, Pat...