Most up-to-date well-behaved topic-based summarization systems are built upon the extractive framework. They score the sentences based on the associated features by manually assig...
In this paper, we describe a system that can extract record structures from web pages with no direct human supervision. Records are commonly occurring HTML-embedded data tuples th...
The National Institute of Standards and Technology (NIST) has developed a form-based handprint recognition system for reading information written on forms. This public domain soft...
Image classification is a well-studied and hard problem in computer vision. We extend a proven solution for classifying web spam to handle images. We exploit the link structure of...
Consider multiple users searching for a hotel room, based on size, cost, distance to the beach, etc. Users may have variable preferences expressed by different weights on the attri...