On script-generated web sites, many documents share common HTML tree structure, allowing wrappers to effectively extract information of interest. Of course, the scripts and thus ...
We consider the problem of template-independent news extraction. The state-of-the-art news extraction method is based on template-level wrapper induction, which has two serious li...
Junfeng Wang, Xiaofei He, Can Wang, Jian Pei, Jiaj...
Geographical context is very important for images. Millions of images on the Web have been already assigned latitude and longitude information. Due to the rapid proliferation of s...
Traditional relation extraction methods require pre-specified relations and relation-specific human-tagged examples. Bootstrapping systems significantly reduce the number of tr...
Jun Zhu, Zaiqing Nie, Xiaojiang Liu, Bo Zhang, Ji-...