In this paper, we present a novel visual analytics system named Newdle with a focus on exploring large online news collections when the semantics of the individual news articles ha...
This paper introduces a method for automatically partitioning richly-formatted electronic documents. An automatic partitioning system has many potential uses, but we focus here on ...
Clustering web search engine results for ambiguous keyword searches poses unique challenges. First, we show that one cannot readily import the frequency based feature ranking to c...
Due to the enormous size of the web and low precision of user queries, finding the right information from the web can be difficult if not impossible. One approach that tries to ...
Usually a meaningful web topic has tens of thousands of comments, especially the hot topics. It is valuable if we congregate the comments into clusters and find out the mainstrea...