Abstract. In this paper we describe a methodology for harvesting information from large distributed repositories (e.g. large Web sites) with minimum user intervention. The methodol...
Fabio Ciravegna, Sam Chapman, Alexiei Dingli, Yori...
For languages with rich content over the web, business reviews are easily accessible via many known websites, e.g., Yelp.com. For languages with poor content over the web like Arab...
We introduce a unified graph representation of the Web, which includes both structural and usage information. We model this graph using a simple union of the Web's hyperlink ...
Barbara Poblete, Carlos Castillo, Aristides Gionis
This is a case study about the early adoption and use of micro-blogging in a Fortune 500 company. The study used several independent data sources: five months of empirical micro-b...
Long-term search history contains rich information about a user's search preferences. In this paper, we study statistical language modeling based methods to mine contextual i...