An increasing number of approaches for ontology engineering from text are gearing towards the use of online sources such as company intranet and the World Wide Web. Despite such r...
In many situations, individuals or groups of individuals are faced with the need to examine sets of documents to achieve understanding of their structure and to locate relevant in...
Alneu de Andrade Lopes, Roberto Pinho, Fernando Vi...
We consider the problem of dust: Different URLs with Similar Text. Such duplicate URLs are prevalent in web sites, as web server software often uses aliases and redirections, and...
Commercial OCR packages work best with highquality scanned images. They often produce poor results when the image is degraded, either because the original itself was poor quality,...
Written documents created through dictation differ significantly from a true verbatim transcript of the recorded speech. This poses an obstacle in automatic dictation systems as s...
Maximilian Bisani, Paul Vozila, Olivier Divay, Jef...