Search engines largely rely on Web robots to collect information from the Web. Due to the unregulated open-access nature of the Web, robot activities are extremely diverse. Such c...
This paper introduces a web image dataset created by NUS’s Lab for Media Search. The dataset includes: (1) 269,648 images and the associated tags from Flickr, with a total of 5,...
Cloud services accessed through mobile devices suffer from high network access latencies and are constrained by energy budgets dictated by the devices’ batteries. Radio and batt...
We present Opal, a light-weight framework for interactively locating missing web pages (http status code 404). Opal is an example of “in vivo” preservation: harnessing the col...
Abstract. The PageRank algorithm demonstrates the significance of the computation of document ranking of general importance or authority in Web information retrieval. However, doi...