As with any application of machine learning, web search ranking requires labeled data. The labels usually come in the form of relevance assessments made by editors. Click logs can...
Web pages are more than text and they contain much contextual and structural information, e.g., the title, the meta data, the anchor text, etc., each of which can be seen as a dat...
We present an end-to-end system that extracts a user’s social network and its members’ contact information given the user’s email inbox. The system identifies unique people...
This paper explores the use of hierarchical structure for classifying a large, heterogeneous collection of web content. The hierarchical structure is initially used to train diffe...
Search result diversification is a natural approach for tackling ambiguous queries. Nevertheless, not all queries are equally ambiguous, and hence different queries could bene...