Spectral clustering is a widely used method for organizing data that only relies on pairwise similarity measurements. This makes its application to non-vectorial data straightforw...
Fabian L. Wauthier, Nebojsa Jojic, Michael I. Jord...
Lighthouse is an on-line interface for a Web-based information retrieval system. It accepts queries from a user, collects the retrieved documents from the search engine, organizes...
We have studied the problem of linking event information across different languages without the use of translation systems or dictionaries. The linking is based on interlingua in...
It is generally believed that propagated anchor text is very important for effective Web search as offered by the commercial search engines. “Google Bombs” are a notable illus...
This paper offers a novel look at using a dimensionalityreduction technique called simhash [8] to detect similar document pairs in large-scale collections. We show that this algo...