In this investigation we propose a novel summarization method of Web pages using hierarchical expression. We discuss close relationship between summarization and hierarchical clust...
The Internet is one of the fastest growing areas of intelligence gathering. We present a statistical approach, called principal clusters analysis, for analyzing millions of user n...
Harris Wu, Michael D. Gordon, Kurt DeMaagd, Weiguo...
The ambiguity of person names in the Web has become a new area of interest for NLP researchers. This challenging problem has been formulated as the task of clustering Web search r...
We study the problem of automatically identifying“hotspots” on the real-time web. Concretely, we propose to identify highly-dynamic ad-hoc collections of users – what we ref...
In this paper we address the problem of unsupervised Web data extraction. We show that unsupervised Web data extraction becomes feasible when supposing pages that are made up of r...