In this paper we present a system for automatically integrating unstructured text into a multi-relational database using state-of-the-art statistical models for structure extracti...
We describe a compression technique for semistructured documents, called SCMPPM, which combines the Prediction by Partial Matching technique with Structural Contexts Model (SCM) t...
In this paper we will describe Berkeley's approach to the Domain Specific (DS) track for CLEF 2008. Last year we used Entry Vocabulary Indexes and Thesaurus expansion approac...
In this paper, we address the task of crosslingual semantic relatedness. We introduce a method that relies on the information extracted from Wikipedia, by exploiting the interlang...
Labeling text data is quite time-consuming but essential for automatic text classification. Especially, manually creating multiple labels for each document may become impractical ...