Dirty data is a serious problem for businesses leading to incorrect decision making, inefficient daily operations, and ultimately wasting both time and money. Dirty data often ari...
We present DIADS, an integrated DIAgnosis tool for Databases and Storage area networks (SANs). Existing diagnosis tools in this domain have a database-only (e.g., [11]) or SAN-onl...
Scalable similarity search is the core of many large scale learning or data mining applications. Recently, many research results demonstrate that one promising approach is creatin...
Cloning in software systems is known to create problems during software maintenance. Several techniques have been proposed to detect the same or similar code fragments in software...
This paper reports on a study involving the automatic extraction of Chinese legal terms. We used a word segmented corpus of Chinese court judgments to extract salient legal expres...