One of the unique challenges to Chinese Language Processing is cross-strait named entity recognition. Due to the adoption of different transliteration strategies, foreign name tra...
Petr Simon, Chu-Ren Huang, Shu-Kai Hsieh, Jia-Fei ...
A crucial step in processing speech audio data for information extraction, topic detection, or browsing/playback is to segment the input into sentence and topic units. Speech segm...
Elizabeth Shriberg, Andreas Stolcke, Dilek Z. Hakk...
This paper describes a method for optimizing the cost matrix of any approximate string matching algorithm based on the Levenshtein distance. The method, which uses genetic algorit...
Abstract. Most common feature selection techniques for document categorization are supervised and require lots of training data in order to accurately capture the descriptive and d...
Position information has been proved to be very effective in document summarization, especially in generic summarization. Existing approaches mostly consider the information of se...