This paper describes a rather simplistic method of unsupervised morphological analysis of words in an unknown language. All what is needed is a raw text corpus in the given langua...
In this paper, we propose a framework for isolating text regions from natural scene images. The main algorithm has two functions: it generates text region candidates, and it veri...
Abstract. We aim to develop a technique to detect search engine optimization (SEO) spam websites. Specifically, we propose four methods for extracting the SEO spam entries from a ...
This paper addresses the issue of automatically extracting keyphrases from document. Previously, this problem was formalized as classification and learning methods for classific...
From the standpoint of the automated extraction of scientific knowledge, an important but little-studied part of scientific publications are the figures and accompanying captions....
William W. Cohen, Richard C. Wang, Robert F. Murph...