Digital content is not only stored by servers on the Internet, but also on various embedded devices belonging to ubiquitous networks. In this paper, we propose a content processin...
We describe a method for improving the precision of metasearch results based upon scoring the visual features of documents' surrogate representations. These surrogate scores ...
Steven M. Beitzel, Eric C. Jensen, Ophir Frieder, ...
Abstract. Automated language identification of written text is a wellestablished research domain that has received considerable attention in the past. By now, efficient and effecti...
In this paper, we study the problem of Web forum crawling. Web forum has now become an important data source of many Web applications; while forum crawling is still a challenging ...
Yida Wang, Jiang-Ming Yang, Wei Lai, Rui Cai, Lei ...
A search engine that can handle TV programs and Web content in an integrated way is proposed. Conventional search engines have been able to handle Web content and/or data stored i...