The aim of this paper is twofold. We focus, on the one hand, on the task of dynamically annotating English compound nouns, and on the other hand we propose disambiguation methods ...
CASIA-CASSIL is a large-scale corpus base of Chinese human-human naturally-occurring telephone conversations in restricted domains. The first edition consists of 792 90-second con...
In Chinese texts, words composed of single or multiple characters are not separated by spaces, unlike most western languages. Therefore Chinese word segmentation is considered an ...
In this paper, we describe a new multi-purpose audio-visual database on the context of speech interfaces for controlling household electronic devices. The database comprises speec...
In this paper, we base on the syntactic structural Chinese Treebank corpus, construct the Chinese Opinon Treebank for the research of opinion analysis. We introduce the tagging sc...