Large repositories of source code create new challenges and opportunities for statistical machine learning. Here we first develop Sourcerer, an infrastructure for the automated c...
Erik Linstead, Paul Rigor, Sushil Krishna Bajracha...
In plenty of scenarios, data can be represented as vectors mathematically abstracted as points in a Euclidean space. Because a great number of machine learning and data mining app...
To cope with the large amount of biological sequences being produced, a significant number of genes and proteins have been annotated by automated tools. A protein annotation is an...
The amount of biological data publicly available has experienced an exponential growth as the technology advances. Online databases are now playing an important role as informatio...
The major challenge in mining data streams is the issue of concept drift, the tendency of the underlying data generation process to change over time. In this paper, we propose a g...