Abstract-Wikipedia is an example of the collaborative, semi-structured data sets emerging on the Web. These data sets have large, nonuniform schema that require costly data integra...
Bryan Chan, Leslie Wu, Justin Talbot, Mike Cammara...
The retrieval of similar documents from large scale datasets has been the one of the main concerns in knowledge management environments, such as plagiarism detection, news impact a...
Felipe Bravo-Marquez, Gaston L'Huillier, Sebasti&a...
We develop a novel approach to the semantic analysis of short text segments and demonstrate its utility on a large corpus of Web search queries. Extracting meaning from short text...
The PageRank algorithm, used in the Google search engine, greatly improves the results of Web search by taking into account the link structure of the Web. PageRank assigns to a pa...
Abstract We present a new ranking algorithm that combines the strengths of two previous methods: boosted tree classification, and LambdaRank, which has been shown to be empiricall...
Qiang Wu, Christopher J. C. Burges, Krysta Marie S...