Today, valuable business information is increasingly stored as unstructured data (documents, emails, etc.). For example, documents exchanged between business partners capture info...
In this paper, we describe a system by which the multilingual characteristics of Wikipedia can be utilized to annotate a large corpus of text with Named Entity Recognition (NER) t...
Unsupervised learning of linguistic structure is a difficult problem. A common approach is to define a generative model and maximize the probability of the hidden structure give...
In this paper, I introduce the DICI, an electronic dictionary of Italian collocations designed to support the acquisition of the collocational competence in learners of Italian as...
This paper investigates whether high-quality annotations for tasks involving semantic disambiguation can be obtained without a major investment in time or expense. We examine the ...
Sara Rosenthal, William Lipovsky, Kathleen McKeown...