As online document collections continue to expand, both on the Web and in proprietary environments, the need for duplicate detection becomes more critical. The goal of this work i...
An important class of queries is the LIKE predicate in SQL. In the absence of an index, LIKE queries are subject to performance degradation. The notion of indexing on substrings (...
The organization of documents is a task that we face as computer users daily. This is particularly true for management of email. Typically email documents are organized in director...
Abstract—Current techniques towards information security have limited capabilities to detect and counter attacks that involve different kinds of masquerade and spread of misinfor...
Term-based representations of documents have found widespread use in information retrieval. However, one of the main shortcomings of such methods is that they largely disregard le...