Extracting entities (such as people, movies) from documents and identifying the categories (such as painter, writer) they belong to enable structured querying and data analysis ov...
We present a browser-extending Semantic Web extraction system that maps HTML documents to tables and, where possible, to rules. First, the basic data extractor ViPER distills and ...
In the past few years, much attention has been paid to the study of semistructured data, i.e., data with irregular, possibly unstable, and rapidly changing structure, and, in part...
The National Taiwan University Library has built a digital library of historical documents about Taiwan. The content is unique in that it covers about 80% of all primary Chinese hi...
A significant amount of information is stored in computer systems today, but people are struggling to manage their documents such that the information is easily found. XML is a de...