We apply the hypothesis of "One Sense Per Discourse" (Yarowsky, 1995) to information extraction (IE), and extend the scope of "discourse" from one single docum...
In this paper we analyze our recent research on the use of document analysis techniques for metadata extraction from PDF papers. We describe a package that is designed to extract ...
Abstract. Social networks have recently attracted much attention for their importance to the Semantic Web. Several methods exist to extract social networks for people (particularly...
We propose a mathematical knowledge browser which helps people to read mathematical documents. By the browser printed mathematical documents can be scanned and recognized by OCR (O...
In manipulating data such as in supervised learning, we often extract new features from original features for the purpose of reducing the dimensions of feature space and achieving ...