Sciweavers

1486 search results - page 98 / 298
» A Document as a Small World
Sort
View
CEAS
2007
Springer
16 years 19 days ago
Hardening Fingerprinting by Context
Near-duplicate detection is not only an important pre and post processing task in Information Retrieval but also an effective spam-detection technique. Among different approache...
Aleksander Kolcz, Abdur Chowdhury
SIGIR
2010
ACM
15 years 10 months ago
Adaptive near-duplicate detection via similarity learning
In this paper, we present a novel near-duplicate document detection method that can easily be tuned for a particular domain. Our method represents each document as a real-valued s...
Hannaneh Hajishirzi, Wen-tau Yih, Aleksander Kolcz
CLIN
2001
15 years 7 months ago
Applying Monte Carlo Techniques to Language Identification
Two major stages stages in language identification systems can be identified: the language modeling stage, where the distinctive features of languages are determined and stored in...
Arjen Poutsma
CORR
2006
Springer
100views Education» more  CORR 2006»
15 years 6 months ago
Automatic annotation of multilingual text collections with a conceptual thesaurus
Automatic annotation of documents with controlled vocabulary terms (descriptors) from a conceptual thesaurus is not only useful for document indexing and retrieval. The mapping of...
Bruno Pouliquen, Ralf Steinberger, Camelia Ignat
JAIR
2010
94views more  JAIR 2010»
15 years 4 months ago
Which Clustering Do You Want? Inducing Your Ideal Clustering with Minimal Feedback
While traditional research on text clustering has largely focused on grouping documents by topic, it is conceivable that a user may want to cluster documents along other dimension...
Sajib Dasgupta, Vincent Ng