In developing High-Performance Computing (HPC) software, time to solution is an important metric. This metric is comprised of two main components: the human effort required develo...
Web spider is a widely used approach to obtain information for search engines. As the size of the Web grows, it becomes a natural choice to parallelize the spider’s crawling proc...
We present a method of chunking in Korean texts using conditional random fields (CRFs), a recently introduced probabilistic model for labeling and segmenting sequence of data. In a...
In this paper a new rule-based approach to break assignment for the Russian language is discussed. It is a flexible and robust method of segmentation of texts in Russian in prosod...
Large quantities of documents in the Internet and digital libraries are simply scanned and archived in image format, many of which are packed in PDF files. The word search tool pr...