Sciweavers

735 search results - page 122 / 147
» Corpora and data preparation
Sort
View
ACHI
2008
IEEE
15 years 8 months ago
Specification for User Modeling with Self-Observing Systems
The complicated user interfaces and complex functionality of nowadays interactive products lead to a new class of failures: People do not understand their products and thus fail t...
Mathias Funk, Piet van der Putten, Henk Corporaal
CICLING
2008
Springer
15 years 8 months ago
A Semantics-Enhanced Language Model for Unsupervised Word Sense Disambiguation
An N-gram language model aims at capturing statistical word order dependency information from corpora. Although the concept of language models has been applied extensively to handl...
Shou-de Lin, Karin Verspoor
ACL
2008
15 years 7 months ago
Mining Wiki Resources for Multilingual Named Entity Recognition
In this paper, we describe a system by which the multilingual characteristics of Wikipedia can be utilized to annotate a large corpus of text with Named Entity Recognition (NER) t...
Alexander E. Richman, Patrick Schone
ACL
2007
15 years 7 months ago
Randomised Language Modelling for Statistical Machine Translation
A Bloom filter (BF) is a randomised data structure for set membership queries. Its space requirements are significantly below lossless information-theoretic lower bounds but it ...
David Talbot, Miles Osborne
CASCON
2007
112views Education» more  CASCON 2007»
15 years 7 months ago
Removing manually generated boilerplate from electronic texts: experiments with project Gutenberg e-books
Collaborative work on unstructured or semistructured documents, such as in literature corpora or source code, often involves agreed upon templates containing metadata. These templ...
Owen Kaser, Daniel Lemire