This paper discusses the use of character images to determine the parameters of an image degradation model. The acute angles in character images provide information used to find ...
We developed and tested a heuristic technique for extracting the main article from news site Web pages. We construct the DOM tree of the page and score every node based on the amo...
This paper proposes a method of collecting a dozen terms that are closely related to a given seed term. The proposed method consists of three steps. The first step, compiling cor...
Many documents are available to a computer only as images from paper. However, most natural language processing systems expect their input as character-coded text, which may be di...
Data-driven function tag assignment has been studied for English using Penn Treebank data. In this paper, we address the question of whether such method can be applied to other la...