Given a user keyword query, current Web search engines return a list of individual Web pages ranked by their "goodness" with respect to the query. Thus, the basic unit fo...
Ramakrishna Varadarajan, Vagelis Hristidis, Tao Li
In companies a large amount of information is maintained that is accessible via network communication tools. This makes searching for a particular piece of information a di cult t...
Patrick Lambrix, Nahid Shahmehri, Niclas Wahll&oum...
Most template detection methods process web pages in batches that a newly crawled page can not be processed until enough pages have been collected. This results in large storage c...
Yu Wang, Binxing Fang, Xueqi Cheng, Li Guo, Hongbo...
Abstract. Web catalog integration is an emerging problem in current digital content management. Past studies show that more improvement on integration accuracy can be achieved with...
Hierarchies provide a means of organizing, summarizing and accessing information. We describe a method for automatically generating hierarchies from small collections of text, and...