Information extraction (IE) from semi-structured Web documents is a critical issue for information integration systems on the Internet. Previous work in wrapper induction aim to so...
The real di culty in development of practical NLP systems comes from the fact that we do not have e ective means for gathering \knowledge". In this paper, we propose an algor...
Satoshi Sekine, Jeremy J. Carroll, Sophia Ananiado...
A new approach to handle unknown words in machine translation is presented. The basic idea is to find definitions for the unknown words on the source language side and translate t...
Statistical language models can learn relationships between topics discussed in a document collection and persons, organizations and places mentioned in each document. We present a...
David Newman, Chaitanya Chemudugunta, Padhraic Smy...
An overwhelming number of legal documents is available in digital form. However, most of the texts are usually only provided in a semi-structured form, i.e. the documents are stru...