Information extraction from HTML pages has been conventionally treated as plain text documents extended with HTML tags. However, the growing maturity and correct usage of HTML/XHT...
Text Mining is one of the best solutions for today and the future’s information explosion. With the development of modern processor technologies, it will be a mass market deskto...
This paper addresses the problem of extracting information from textual documents, either normal documents or web pages. A new approach for extracting complicate information from ...
Luo Xiao, Dieter Wissmann, Michael Brown, Stefan J...
Information extraction (IE) — the problem of extracting structured information from unstructured text — has become an increasingly important topic in recent years. A SIGMOD 20...
Laura Chiticariu, Yunyao Li, Sriram Raghavan, Fred...
Even in a massive corpus such as the Web, a substantial fraction of extractions appear infrequently. This paper shows how to assess the correctness of sparse extractions by utiliz...