Sciweavers

2117 search results - page 338 / 424
» A Competitive Term Selection Method for Information Retrieva...
Sort
View
CIKM
2011
Springer
14 years 6 months ago
Probabilistic near-duplicate detection using simhash
This paper offers a novel look at using a dimensionalityreduction technique called simhash [8] to detect similar document pairs in large-scale collections. We show that this algo...
Sadhan Sood, Dmitri Loguinov
CIKM
2008
Springer
15 years 8 months ago
Identifying table boundaries in digital documents via sparse line detection
Most prior work on information extraction has focused on extracting information from text in digital documents. However, often, the most important information being reported in an...
Ying Liu, Prasenjit Mitra, C. Lee Giles
ADBIS
2006
Springer
104views Database» more  ADBIS 2006»
16 years 9 days ago
Multi-source Materialized Views Maintenance: Multi-level Views
In many information systems, the databases that make up the system are distributed in different modules or branch offices according to the requirements of the business enterprise. ...
Josep Silva, Jorge Belenguer, Matilde Celma
CIKM
2007
Springer
16 years 15 days ago
Randomized metric induction and evolutionary conceptual clustering for semantic knowledge bases
We present an evolutionary clustering method which can be applied to multi-relational knowledge bases storing resource annotations expressed in the standard languages for the Sema...
Nicola Fanizzi, Claudia d'Amato, Floriana Esposito
WWW
2007
ACM
16 years 7 months ago
Combining classifiers to identify online databases
We address the problem of identifying the domain of online databases. More precisely, given a set F of Web forms automatically gathered by a focused crawler and an online database...
Luciano Barbosa, Juliana Freire