Automatically categorizing documents into pre-defined topic hierarchies or taxonomies is a crucial step in knowledge and content management. Standard machine learning techniques ...
Sponsored search is one of the enabling technologies for today's Web search engines. It corresponds to matching and showing ads related to the user query on the search engine...
The social media site Flickr allows users to upload their photos, annotate them with tags, submit them to groups, and also to form social networks by adding other users as contact...
The challenge of similarity search in massive DNA sequence databases has inspired major changes in BLAST-style alignment tools, which accelerate search by inspecting only pairs of...
In many text retrieval tasks, it is highly desirable to obtain a "similarity profile" of the document collection for a given query. We propose sampling-based techniques ...