Sciweavers

773 search results - page 28 / 155
» Suffix Trees on Words
Sort
View
DRR
2010
15 years 8 months ago
Efficient automatic OCR word validation using word partial format derivation and language model
In this paper we present an OCR validation module, implemented for the System for Preservation of Electronic Resources (SPER) developed at the U.S. National Library of Medicine.1 ...
Siyuan Chen, Dharitri Misra, George R. Thoma
DCC
2007
IEEE
16 years 5 months ago
Simple Linear-Time Off-Line Text Compression by Longest-First Substitution
We consider grammar based text compression with longest first substitution, where non-overlapping occurrences of a longest repeating substring of the input text are replaced by a ...
Ryosuke Nakamura, Hideo Bannai, Shunsuke Inenaga, ...
PARA
2000
Springer
15 years 9 months ago
Parallel and Distributed Document Overlap Detection on the Web
Proliferation of digital libraries plus availability of electronic documents from the Internet have created new challenges for computer science researchers and professionals. Docum...
Krisztián Monostori, Arkady B. Zaslavsky, H...
ECIR
2006
Springer
15 years 7 months ago
Improving Quality of Search Results Clustering with Approximate Matrix Factorisations
Abstract. In this paper we show how approximate matrix factorisations can be used to organise document summaries returned by a search engine into meaningful thematic categories. We...
Stanislaw Osinski
CLEIEJ
2008
72views more  CLEIEJ 2008»
15 years 6 months ago
Measuring Contribution of HTML Features in Web Document Clustering
Documents in HTML format have many features to analyze, from the terms in special sections to the phrases that appear in the whole document. However, it is important to decide whi...
Esteban Meneses, Oldemar Rodríguez-Rojas