Extracting titles from a PDFs full text is an important task in information retrieval to identify PDFs. Existing approaches apply complicated and expensive (in terms of calculating...
We propose a novel semi-supervised method for building a statistical model that represents the relationship between sounds and text labels (“tags”). The proposed method, named...
Jun Takagi, Yasunori Ohishi, Akisato Kimura, Masas...
Wikipedia despite having a very small budget has been among the top ten most visited websites for over half a decade. Being this visible also generated the problem of ill intended ...
This paper describes the participation of UAIC team at the LogCLEF 2011 initiative, language identification task. Our approach is an aggregation of known methods for recognizing la...
The Chinese comma signals the boundary of discourse units and also anchors discourse relations between adjacent text spans. In this work, we propose a discourse structureoriented ...