We investigate how to organize a large collection of geotagged photos, working with a dataset of about 35 million images collected from Flickr. Our approach combines content analy...
David J. Crandall, Lars Backstrom, Daniel P. Hutte...
Automatic categorization of user queries is an important component of general purpose (Web) search engines, particularly for triggering rich, query-specific content and sponsored ...
This paper presents an extensive study about the evolution of textual content on the Web, which shows how some new pages are created from scratch while others are created using al...
In this paper, we describe a system that can extract record structures from web pages with no direct human supervision. Records are commonly occurring HTML-embedded data tuples th...
In this paper, we describe an application, PubCloud that uses tag clouds for the summarization of results from queries over the PubMed database of biomedical literature. PubCloud ...
Benjamin M. Good, Byron Yu-Lin Kuo, Mark D. Wilkin...