Abstract. The massive amount of textual data on the Web raises numerous classification problems. Although the notion of domain is widely acknowledged in the IR field, the applica...
In sequential prediction tasks, one repeatedly tries to predict the next element in a sequence. A classical way to solve these problems is to fit an order-n Markov model to the da...
Participating in a text retrieval conference for the first time, Eidetica has run six minimalistic tests with its t·repository indexer, doing as little tuning as possible, in ord...
This paper describes an unsupervised algorithm for segmenting categorical time series. The algorithm first collects statistics about the frequency and boundary entropy of ngrams, t...
It is important for future NLP systems to formulate the semantic equivalence (and more generally, the semantic similarity) of natural language expressions. In particular, paraphra...