The web crawler space is often delimited into two general areas: full-web crawling and focused crawling. We present netSifter, a crawler system which integrates features from thes...
We consider the problem of dust: Different URLs with Similar Text. Such duplicate URLs are prevalent in web sites, as web server software often uses aliases and redirections, and...
Most discrete event simulation frameworks are able to output simulation runs as a trace. The Network Simulator 2 (NS2) is a prominent example that does so to decouple generation o...
We present an event-driven video adaptation system in this paper. Events are detected by audio/video analysis and annotated by the description schemes (DSs) provided by MPEG-7 Mul...
Min Xu, Jiaming Li, Yiqun Hu, Liang-Tien Chia, Bu-...
To exploit the similarity information hidden in the hyperlink structure of the web, this paper introduces algorithms scalable to graphs with billions of vertices on a distributed ...