We consider the coverage testing problem where we are given a document and a corpus with a limited query interface and asked to find if the corpus contains a near-duplicate of th...
Ali Dasdan, Paolo D'Alberto, Santanu Kolay, Chris ...
We describe a novel simple and highly scalable semi-supervised method called Word-Class Distribution Learning (WCDL), and apply it the task of information extraction (IE) by utili...
Yanjun Qi, Ronan Collobert, Pavel Kuksa, Koray Kav...
We present a novel approach to automatic information extraction from Deep Web Life Science databases using wrapper induction. Traditional wrapper induction techniques focus on lear...
Recently, most of malicious web pages include obfuscated codes in order to circumvent the detection of signature-based detection systems .It is difficult to decide whether the sti...
YoungHan Choi, TaeGhyoon Kim, SeokJin Choi, CheolW...
Middleware facilitates the development of distributed systems by accommodating heterogeneity, hiding distribution details and providing a set of common and domain specific service...