With the vast amount of potential relevant documents on the Web, a key question for a retrieval system is how to achieve a high accuracy retrieval under current Web setting. The w...
The goal of this work is to study the feasibility of a Heterogeneous Data Classification and Search (HDCS) system and to provide a possible design for its implementing. In order t...
Dorin Carstoiu, Alexandra Cernian, Adriana Olteanu...
In this paper, we propose a new system extracting potentially copyright infringement texts from the Web, called EPCI. EPCI extracts them in the following way: (1) generating a set...
Takashi Tashiro, Takanori Ueda, Taisuke Hori, Yu H...
This paper presents Multilingual Document Clustering (MDC) on comparable corpora. Wikipedia, a structured multilingual knowledge base, has been highly exploited in many monolingual...
Background: Recent progress in cDNA and EST sequencing is yielding a deluge of sequence data. Like database search results and proteome databases, this data gives rise to inferred...
Michael Spitzer, Stefan Lorkowski, Paul Cullen, Al...