In this paper, we present a semi-supervised learning method for web page classification, leveraging click logs to augment training data by propagating class labels to unlabeled si...
Soo-Min Kim, Patrick Pantel, Lei Duan, Scott Gaffn...
Programmers frequently use the Web while writing code: they search for libraries, code examples, tutorials, and documentation. This link between code and visited Web pages remains...
To enable information integration, schema matching is a critical step for discovering semantic correspondences of attributes across heterogeneous sources. As a new attempt, this p...
Information extraction (IE) from semi-structured Web documents is a critical issue for information integration systems on the Internet. Previous work in wrapper induction aim to so...
Current Web search tools do a good job of retrieving documents that satisfy the wide range of intentions that people associate with a query – but do not do a very good job of di...