Web forums have become an important data resource for many web applications, but extracting structured data from unstructured web forum pages is still a challenging task due to bo...
Jiang-Ming Yang, Rui Cai, Yida Wang, Jun Zhu, Lei ...
The GE NLTooLsET is a set of text interpretation tools designed to be easily adapted to new domains. This report summarizes the system and its performance on the MUG-4 task . INTR...
George B. Krupka, Paul S. Jacobs, Lisa F. Rau, Loi...
Web Page segmentation is a crucial step for many applications in Information Retrieval, such as text classification, de-duplication and full-text search. In this paper we describe...
Users often try to accumulate information on a topic of interest from multiple information sources. In this case a user's informational need might be expressed in terms of an...
This paper will present an approach that fosters a seamless integration of documents with corporate information systems. It is based on a conceptually enhanced notion of documents...