For effective training of acoustic and language models for spontaneous speech such as meetings, it is significant to exploit the texts available in a large scale, which may not b...
Although fully generative models have been successfully used to model the contents of text documents, they are often awkward to apply to combinations of text data and document met...
With the increase in popularity of online review sites comes a corresponding need for tools capable of extracting the information most important to the user from the plain text da...
In this paper we present CUTER, a system that processes HTML pages in order to extract the useful text from them. The mechanism is focalized on HTML pages that include news articl...
George Adam, Christos Bouras, Vassilis Poulopoulos