Sciweavers

336 search results - page 61 / 68
» Content-based language models for spoken document retrieval
Sort
View
CICLING
2010
Springer
15 years 10 months ago
Word Length n-Grams for Text Re-use Detection
Abstract. The automatic detection of shared content in written documents –which includes text reuse and its unacknowledged commitment, plagiarism– has become an important probl...
Alberto Barrón-Cedeño, Chiara Basile...
IJCNLP
2005
Springer
15 years 11 months ago
Inversion Transduction Grammar Constraints for Mining Parallel Sentences from Quasi-Comparable Corpora
Abstract. We present a new implication of Wu’s (1997) Inversion Transduction Grammar (ITG) Hypothesis, on the problem of retrieving truly parallel sentence translations from larg...
Dekai Wu, Pascale Fung
SIGIR
2009
ACM
16 years 13 days ago
Estimating query performance using class predictions
We investigate using topic prediction data, as a summary of document content, to compute measures of search result quality. Unlike existing quality measures such as query clarity ...
Kevyn Collins-Thompson, Paul N. Bennett
SIGIR
2009
ACM
16 years 13 days ago
Incorporating prior knowledge into a transductive ranking algorithm for multi-document summarization
This paper presents a transductive approach to learn ranking functions for extractive multi-document summarization. At the first stage, the proposed approach identifies topic th...
Massih-Reza Amini, Nicolas Usunier
WWW
2010
ACM
16 years 27 days ago
CETR: content extraction via tag ratios
We present Content Extraction via Tag Ratios (CETR) – a method to extract content text from diverse webpages by using the HTML document’s tag ratios. We describe how to comput...
Tim Weninger, William H. Hsu, Jiawei Han