We describe Thresher, a system that lets non-technical users teach their browsers how to extract semantic web content from HTML documents on the World Wide Web. Users specify exam...
Eavesdropping on electronic communication is usually prevented by using cryptography-based mechanisms. However, these mechanisms do not prevent one from obtaining private informat...
In this paper, we report the development and experiments of IBM Content Harvester (CH), a tool to analyze and recover templates and content from word processor created text docume...
Real world images are complex objects, difficult to describe but at the same time possessing a high degree of redundancy. A very recent study [1] on the statistical properties of n...
We participated to the TREC-X QA main task and list task with a new system named QUANTUM, which analyzes questions with shallow parsing techniques and regular expressions. Instead...