A framework is presented for discovering partial duplicates in large collections of scanned books with optical character recognition (OCR) errors. Each book in the collection is r...
There has been little work that attempts to improve the recognition of spontaneous, conversational speech by adding information from a loosely-coupled modality. This study investi...
We investigate two seemingly incompatible approaches for improving document retrieval performance in the context of question answering: query expansion and query reduction. Querie...
We have developed an automated Japanese essay scoring system named jess. The system evaluates an essay from three features: (1) Rhetoric -- ease of reading, diversity of vocabular...
We investigate the optimal LM treatment of abundant filled pauses (FP) in spontaneous monologues of a professional dictation task. Questions addressed here are (1) how to deal wi...