Improving data quality is a time-consuming, labor-intensive and often domain specific operation. Existing data repair approaches are either fully automated or not efficient in int...
Mohamed Yakout, Ahmed K. Elmagarmid, Jennifer Nevi...
Support vector machines (SVMs) have been widely used in multimedia retrieval to learn a concept in order to find the best matches. In such a SVM active learning environment, the ...
Collections are a fundamental tool for reproducible evaluation of information retrieval techniques. We describe a new method for distributing the document lengths and term counts ...
Citation matching, or the automatic grouping of bibliographic references that refer to the same document, is a data management problem faced by automatic digital libraries for sci...
Isaac G. Councill, Huajing Li, Ziming Zhuang, Sand...
Online communities have become popular for publishing and searching content, as well as for finding and connecting to other users. User-generated content includes, for example, pe...
Ralf Schenkel, Tom Crecelius, Mouna Kacimi, Sebast...