The Web contains an abundance of useful semistructured information about real world objects, and our empirical study shows that strong sequence characteristics exist for Web infor...
Jun Zhu, Zaiqing Nie, Ji-Rong Wen, Bo Zhang, Wei-Y...
In this paper, we study the problem of learning block classification models to estimate block functions. We distinguish general models, which are learned across multiple sites, an...
Ocsigen is a framework for programming highly dynamic web sites in Objective Caml. It allows to program sites as Ocaml applications and introduces new concepts to take into accoun...
We present a novel approach to automatic information extraction from Deep Web Life Science databases using wrapper induction. Traditional wrapper induction techniques focus on lear...
In this paper, we propose to study the characteristics for analyzing subjective content in documents. For that purpose, we present and evaluate a novel method based on abstraction...