A critical problem in developing information agents for the Web is accessing data that is formatted for human use. We have developed a set of tools for extracting data from web si...
Craig A. Knoblock, Kristina Lerman, Steven Minton,...
Abstract. The nearest neighbor and the perceptron algorithms are intuitively motivated by the aims to exploit the “cluster” and “linear separation” structure of the data to...
In this paper, we propose a novel approach for understanding and analyzing the online handwritten chemical formulas. With the structural characteristics, semantic rules, and more ...
This work is motivated by the necessity to automate the discovery of structure in vast and evergrowing collection of relational data commonly represented as graphs, for example ge...
Implicitly structured content on the Web such as HTML tables and lists can be extremely valuable for web search, question answering, and information retrieval, as the implicit str...