The World Wide Web (WWW) has provided us with a plethora of information. However, given its unstructured format, this information is useful mainly to humans and cannot be effectiv...
The World-Wide Web consists of a huge number of unstructured documents, but it also contains structured data in the form of HTML tables. We extracted 14.1 billion HTML tables from...
Michael J. Cafarella, Alon Y. Halevy, Daisy Zhe Wa...
Web spam is a widely-recognized threat to the quality and security of the Web. Web spam pages pollute search engine indexes, burden Web crawlers and Web mining services, and expos...
The standard method for making the full content of audio and video material searchable and is to annotate it with humangenerated meta-data that describes the content in a way that...
Personalized search is a promising way to better serve different users' information needs. Search history is one of the major information sources for search personalization. ...