This paper addresses the problem of topic distillation on the World Wide Web, namely, given a typical user query to find quality documents related to the query topic. Connectivity...
We describe ongoing research on segmenting and labeling HTML medical journal articles. In contrast to existing approaches in which HTML tags usually serve as strong indicators, we...
The problem of version detection is critical in many important application scenarios, including software clone identification, Web page ranking, plagiarism detection, and peer-to-...
Deise de Brum Saccol, Nina Edelweiss, Renata de Ma...
Effective extraction of query relevant information present within documents on the web is a nontrivial task. In this paper we present our system called QueSTS, which does the abov...
M. Sravanthi, C. Ravindranath Chowdary, P. Sreeniv...
Abstract. In this paper we show how approximate matrix factorisations can be used to organise document summaries returned by a search engine into meaningful thematic categories. We...