Detection of near duplicate documents is an important problem in many data mining and information filtering applications. When faced with massive quantities of data, traditional d...
Aleksander Kolcz, Abdur Chowdhury, Joshua Alspecto...
Web Mining Systems exploit the redundancy of data published on the Web to automatically extract information from existing web documents. The first step in the Information Extract...
Kostyantyn M. Shchekotykhin, Dietmar Jannach, Gerh...
We experimentally evaluate randomization-based approaches to creating an ensemble of decision-tree classifiers. Unlike methods related to boosting, all of the eight approaches co...
Lawrence O. Hall, Kevin W. Bowyer, Robert E. Banfi...
So far, most image mining was based on interactive querying. Although
such querying will remain important in the future, several
applications need image mining at such wide scale...
Luc J. Van Gool, Michael D. Breitenstein, Stephan ...
The Social Informatics Data Grid (SIDGrid) is a new cyberinfrastructure designed to transform how social and behavioral scientists collect and annotate data, collaborate and share...