Unicode is becoming a dominant character representation format for information processing. This presents a very dangerous usability and security problem for many applications. The...
Anthony Y. Fu, Xiaotie Deng, Liu Wenyin, Greg Litt...
In traditional text clustering methods, documents are represented as "bags of words" without considering the semantic information of each document. For instance, if two ...
Xiaohua Hu, Xiaodan Zhang, Caimei Lu, E. K. Park, ...
Abstract—Natural language understanding involves the simultaneous consideration of a large number of different sources of information. Traditional methods employed in language an...
In this paper we examine 1) the scope of geo-ontologies used especially for the purposes of information retrieval on the Web, 2) the core geographical concepts and their mutual re...
Hypothesis generation is a crucial initial step for making scientific discoveries. This paper addresses the problem of automatically discovering interesting hypotheses from the we...