The web crawler space is often delimited into two general areas: full-web crawling and focused crawling. We present netSifter, a crawler system which integrates features from thes...
We present an empirical evaluation and comparison of two content extraction methods in HTML: absolute XPath expressions and relative XPath expressions. We argue that the relative ...
Marek Kowalkiewicz, Maria E. Orlowska, Tomasz Kacz...
In bridging the digital divide, two important criteria are cost-effectiveness, and power optimization. While 802.11 is cost-effective and is being used in several installations in...
XML has become one of the core technologies for contemporary business applications, especially web-based applications. To facilitate processing of diverse XML data, we propose an ...
Quanzhong Li, Michelle Y. Kim, Edward So, Steve Wo...
In this paper we present HearSay, a system for browsing hypertext Web documents via audio. The HearSay system is based on our novel approach to automatically creating audio browsa...