We are interested in diacritizing Semitic languages, especially Syriac, using only diacritized texts. Previous methods have required the use of tools such as part-of-speech tagger...
This paper presents an efficient inference algorithm of conditional random fields (CRFs) for large-scale data. Our key idea is to decompose the output label state into an active s...
This paper presents two pivot strategies for statistical machine transliteration, namely system-based pivot strategy and model-based pivot strategy. Given two independent source-p...
Min Zhang, Xiangyu Duan, Vladimir Pervouchine, Hai...
We describe an effective constituent projection strategy, where constituent projection is performed on the basis of dependency projection. Especially, a novel measurement is propo...
Shrinkage-based exponential language models, such as the recently introduced Model M, have provided significant gains over a range of tasks [1]. Training such models requires a l...
Abhinav Sethy, Stanley F. Chen, Bhuvana Ramabhadra...