Most of the Web-based methods for lexicon augmenting consist in capturing global semantic features of the targeted domain in order to collect relevant documents from the Web. We s...
This paper describes DUTIR at TREC 2007 Blog Track. In data preprocessing, a non English language list created from the corpus was used to remove the non English blogs, blog templ...
Rui Song, Qin Tang, Daming Shi 0002, Hongfei Lin, ...
The abundance of content on the web and the lack of quality control require more refined approaches in analyzing online information. In this paper, we propose evaluating the extent...
We present the Conformal Embedding Analysis (CEA) for feature extraction and dimensionality reduction. Incorporating both conformal mapping and discriminating analysis, CEA projec...
Extracting useful knowledge from large network datasets has become a fundamental challenge in many domains, from scientific literature to social networks and the web. We introduc...
Duen Horng Chau, Aniket Kittur, Jason I. Hong, Chr...