We present a study of new word identification (NWI) to improve the performance of a Chinese word segmenter. In this paper the distribution and types of new words are discussed emp...
Clustering by document concepts is a powerful way of retrieving information from a large number of documents. This task in general does not make any assumption on the data distrib...
Similarity search and data mining often rely on distance or similarity functions in order to provide meaningful results and semantically meaningful patterns. However, standard dist...
Tobias Emrich, Franz Graf, Hans-Peter Kriegel, Mat...
—We present a new distributed genetic algorithm that can be used to extract useful information from distributed, large data over the network. The main idea of the proposed algori...
Hyunjung Lee, Byonghwa Oh, Jihoon Yang, Seonho Kim
Recent study has shown that canonical algorithms such as Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) can be obtained from graph based dimensionality ...