Multinomial distributions over words are frequently used to model topics in text collections. A common, major challenge in applying all such topic models to any text mining proble...
In this paper, we present a multi-level recognizer for online Arabic handwriting. In Arabic script (handwritten and printed), cursive writing – is not a style – it is an inher...
In this paper we propose a system to solve a language game, called Guillotine, which requires a player with a strong cultural and linguistic background knowledge. The player obser...
Pierpaolo Basile, Marco Degemmis, Pasquale Lops, G...
This paper describes a rather simplistic method of unsupervised morphological analysis of words in an unknown language. All what is needed is a raw text corpus in the given langua...
We present a new statistical compression method, which we call Phrase Based Dense Code (PBDC), aimed at compressing large digital libraries. PBDC compresses the text collection to ...