We present in this article the methods we used for obtaining measures to ensure the quality and well-formedness of a text corpus. These measures allow us to determine the compatib...
Software development is prone to time-consuming and expensive errors. Finding and correcting errors in a program (debugging) is usually done by executing the program with differen...
Speech interfaces to question-answering systems offer significant potential for finding information with phones and mobile networked devices. We describe a demonstration of spok...
Building data integration systems today is largely done by hand, in a very labor intensive and error prone process. In this paper, we describe a conceptually new solution to this ...
In this paper, we introduce a generative probabilistic optical character recognition (OCR) model that describes an end-to-end process in the noisy channel framework, progressing f...