Vast amounts of text on the Web are unstructured and ungrammatical, such as classified ads, auction listings, forum postings, etc. We call such text “posts.” Despite their in...
A major obstacle to fully integrated deployment of many data mining algorithms is the assumption that data sits in a single table, even though most real-world databases have compl...
Alexandrin Popescul, Lyle H. Ungar, Steve Lawrence...
Background: Although there are a large number of thesauri for the biomedical domain many of them lack coverage in terms and their variant forms. Automatic thesaurus construction b...
This paper presented an overview of Chinese bi-character words' morphological types, and proposed a set of features for machine learning approaches to predict these types bas...
For describing and analyzing digital images of paintings we propose a model to serve as the basis for an interactive image retrieval system. The model defines two types of feature...
Thomas E. Lombardi, Sung-Hyuk Cha, Charles C. Tapp...