A variety of information extraction techniques rely on the fact that instances of the same relation are "distributionally similar," in that they tend to appear in simila...
The k-means algorithm is the method of choice for clustering large-scale data sets and it performs exceedingly well in practice. Most of the theoretical work is restricted to the c...
It is recognized that many evaluation metrics of machine translation in use that focus on surface word level suffer from their lack of tolerance of linguistic variance, and the in...
We propose an original automatic alignment of definitions taken from different dictionaries that could be associated to the same concept although they may have different labels. Th...
Laura Diosan, Alexandrina Rogozan, Jean-Pierre P&e...
In this paper, we argue that the agglomerative clustering with vector cosine similarity measure performs poorly due to two reasons. First, the nearest neighbors of a document belo...