We propose an unsupervised method for detecting spam documents from Web page data, based on equivalence relations on strings. We propose 3 measures for quantifying the alienness (...
Many applications in information retrieval, natural language processing, data mining, and related fields require a ranking of instances with respect to a specified criteria as op...
Developers of Semantic Web applications face a challenge with respect to the decentralised publication model: where to find statements about encountered resources. The “linked d...
This paper describes and evaluates various general stemming approaches for the French, Portuguese (Brazilian), German and Hungarian languages. Based on the CLEF test-collections, ...
Melodies provide an important conceptual summarization of polyphonic audio. The extraction of melodic content has practical applications ranging from content-based audio retrieval...