Sciweavers

735 search results - page 64 / 147
» Corpora and data preparation
Sort
View
IJCNLP
2005
Springer
15 years 11 months ago
A Chunking Strategy Towards Unknown Word Detection in Chinese Word Segmentation
This paper proposes a chunking strategy to detect unknown words in Chinese word segmentation. First, a raw sentence is pre-segmented into a sequence of word atoms 1 using a maximum...
Guodong Zhou
AAAI
2008
15 years 8 months ago
Cross-lingual Propagation for Morphological Analysis
Multilingual parallel text corpora provide a powerful means for propagating linguistic knowledge across languages. We present a model which jointly learns linguistic structure for...
Benjamin Snyder, Regina Barzilay
LREC
2008
96views Education» more  LREC 2008»
15 years 7 months ago
Thai Broadcast News Corpus Construction and Evaluation
Large speech and text corpora are crucial to the development of a state-of-the-art speech recognition system. This paper reports on the construction and evaluation of the first Th...
Markpong Jongtaveesataporn, Chai Wutiwiwatchai, Ko...
LREC
2008
111views Education» more  LREC 2008»
15 years 7 months ago
The ATCOSIM Corpus of Non-Prompted Clean Air Traffic Control Speech
Air traffic control (ATC) is based on voice communication between pilots and controllers and uses a highly task and domain specific language. Due to this very reason, spoken langu...
Konrad Hofbauer, Stefan Petrik, Horst Hering
LREC
2008
135views Education» more  LREC 2008»
15 years 7 months ago
CORP-ORAL: Spontaneous Speech Corpus for European Portuguese
Research activity on the Portuguese language for speech synthesis and recognition has suffered from a considerable lack of human and material resources. This has raised some obsta...
Fabíola Santos, Tiago Freitas