Previous research in cross-document entity coreference has generally been restricted to the offline scenario where the set of documents is provided in advance. As a consequence, t...
Due to the great variation of biological names in biomedical text, appropriate tokenization is an important preprocessing step for biomedical information retrieval. Despite its im...
Image classification is often used to extract information from multi-spectral satellite images. Unsupervised methods can produce results well adjusted to the data, but that are us...
Temporal text mining deals with discovering temporal patterns in text over a period of time. A Theme Evolution Graph (TEG) is used to visualize when new themes are created and how...
Image spam is a new obfuscating method which spammers invented to more effectively bypass conventional text based spam filters. In this paper, we extract local invariant features ...
Haiqiang Zuo, Weiming Hu, Ou Wu, Yunfei Chen, Guan...