Abstract. Governments often hold very rich data and whilst much of this information is published and available for re-use by others, it is often trapped by poor data structures, lo...
Harith Alani, David Dupplaw, John Sheridan, Kieron...
In this poster, we present an information extraction engine for web-based forums. The engine analyzes the HTML files crawled from web forums, deduces the wrapper (template) of the...
Hanny Yulius Limanto, Nguyen Ngoc Giang, Vo Tan Tr...
In this work we compare different techniques to automatically find candidate web pages to substitute broken links. We extract information from the anchor text, the content of the p...