Previous anti-spamming algorithms based on link structure suffer from either the weakness of the page value metric or the vagueness of the seed selection. In this paper, we propos...
— We present three general approaches to detecting prototypical entities in a given taxonomy and apply them to a music information retrieval (MIR) problem. More precisely, we try...
On script-generated web sites, many documents share common HTML tree structure, allowing wrappers to effectively extract information of interest. Of course, the scripts and thus ...
Abstract. The aim of this paper is to present a new tool of multiple instance learning which is designed using a grammar based genetic programming (GGP) algorithm. We study its app...
Parallel corpora are a valuable resource for tasks such as cross-language information retrieval and data-driven natural language processing systems. Previously only small scale cor...