In today's Internet applications or sensor networks we often encounter large amounts of data spread over many physically distributed nodes. The sheer volume of the data and ba...
Ashwin Lall, Haiquan (Chuck) Zhao, Jun Xu, Mitsuno...
Similarity joins have been studied as key operations in multiple application domains, e.g., record linkage, data cleaning, multimedia and video applications, and phenomena detectio...
Abstract-- Despite the best intentions of disk and RAID manufacturers, on-disk data can still become corrupted. In this paper, we examine the effects of corruption on database mana...
An ad hoc data format is any non-standard, semi-structured data format for which robust data processing tools are not available. In this paper, we present ANNE, a new kind of mark...
Calling context--the set of active methods on the stack--is critical for understanding the dynamic behavior of large programs. Dynamic program analysis tools, however, are almost ...