In this paper, we propose a new system extracting potentially copyright infringement texts from the Web, called EPCI. EPCI extracts them in the following way: (1) generating a set...
Takashi Tashiro, Takanori Ueda, Taisuke Hori, Yu H...
As part of a large effort to acquire large repositories of facts from unstructured text on the Web, a seed-based framework for textual information extraction allows for weakly sup...
Peer-to-peer (P2P) Web search has gained a lot of interest lately, due to the salient characteristics of P2P systems, namely scalability, fault-tolerance and load-balancing. Howev...
In this paper, we present a system we have developed for automatic TV News video indexing that successfully combines results from the fields of speaker verification, acoustic anal...
This paper is to investigate the group behavior patterns of search activities based on Web search history data, i.e., clickthrough data, to boost search performance. We propose a ...