This paper addresses the issue of devising a new document prior for the language modeling (LM) approach for Information Retrieval. The prior is based on term statistics, derived in...
Many of the Japanese ideographs (Chinese characters) have a few meanings. Such ambiguities should be identified by using their contextual information. For example, we have an ideo...
In this paper, we argue that the agglomerative clustering with vector cosine similarity measure performs poorly due to two reasons. First, the nearest neighbors of a document belo...
In medicine many exceptions occur. In medical practise and in knowledge-based systems too, it is necessary to consider them and to deal with them appropriately. In medical studies ...
In this paper, a new language model, the Multi-Class Composite N-gram, is proposed to avoid a data sparseness problem for spoken language in that it is difficult to collect traini...