The software clustering problem has attracted much attention recently, since it is an integral part of the process of reverse engineering large software systems. A key problem in ...
This paper is concerned with efficient querying of very large multi-resolution datasets on storage and compute clusters. We present a suite of services that support storage, index...
This paper presents two novel features of an emergent data visualization method coined "cellular ants": unsupervised data class labeling and shape negotiation. This metho...
Andrew Vande Moere, Justin James Clayden, Andy Don...
Statistical machine learning methods are employed to train a Named Entity Recognizer from annotated data. Methods like Maximum Entropy and Conditional Random Fields make use of fe...
We present a technique for augmenting annotated training data with hierarchical word clusters that are automatically derived from a large unannotated corpus. Cluster membership is...