The top-k similarity joins have been extensively studied and used
in a wide spectrum of applications such as information retrieval, decision
making, spatial data analysis and dat...
The query equivalence problem has been studied extensively for set-semantics and, more recently, for bag-set semantics. However, SQL queries often combine set and bag-set semantic...
In this paper, we propose a new similarity measure to compute the pairwise similarity of text-based documents based on suffix tree document model. By applying the new suffix tree ...
Government regulations are semi-structured text documents that are often voluminous, heavily cross-referenced between provisions and even ambiguous. Multiple sources of regulation...
Statistical measures of word similarity have application in many areas of natural language processing, such as language modeling and information retrieval. We report a comparative...