Abstract. A new web content structure based on visual representation is proposed in this paper. Many web applications such as information retrieval, information extraction and auto...
As XML has become an emerging standard for information exchange on the World Wide Web, it has gained attention in database communities to extract information from XML seen as a dat...
Tae-Sun Chung, Sangwon Park, Sang-Yong Han, Hyoung...
In this paper, we propose a framework for isolating text regions from natural scene images. The main algorithm has two functions: it generates text region candidates, and it veriï...
We developed and tested a heuristic technique for extracting the main article from news site Web pages. We construct the DOM tree of the page and score every node based on the amo...
At least two kinds of relations exist among related words: taxonomical relations and thematic relations. Both relations identify related words useful to language understanding and...