An effective graphic interface is a key tool to improve the fruition of the results retrieved by an Information Retrieval (IR) system. In this work, we describe a two-dimensional...
Lorenzo De Stefani, Giorgio Maria Di Nunzio, Giorg...
The Dirichlet compound multinomial (DCM) distribution, also called the multivariate Polya distribution, is a model for text documents that takes into account burstiness: the fact ...
We propose a hybrid, unsupervised document clustering approach that combines a hierarchical clustering algorithm with Expectation Maximization. We developed several heuristics to ...
Document collections evolve over time, new topics emerge and old ones decline. At the same time, the terminology evolves as well. Much literature is devoted to topic evolution in ...
In this paper we propose a character segmentation method for multispectral images of ancient documents. Due to the low quality of the images the main idea of this study is to comb...