The World-Wide Web consists of a huge number of unstructured documents, but it also contains structured data in the form of HTML tables. We extracted 14.1 billion HTML tables from...
Michael J. Cafarella, Alon Y. Halevy, Daisy Zhe Wa...
: This paper describes an ontology-based approach aiming at helping biologists to annotate their documents and at facilitating their information retrieval task. Our approach, based...
Online services such as web search, news portals, and ecommerce applications face the challenge of providing highquality experiences to a large, heterogeneous user base. Recent ef...
Link spam deliberately manipulates hyperlinks between web pages in order to unduly boost the search engine ranking of one or more target pages. Link based ranking algorithms such ...
We present SETS, an architecture for efficient search in peer-to-peer networks, building upon ideas drawn from machine learning and social network theory. The key idea is to arran...