In the paper we investigate the impact of data size on a Word Sense Disambiguation task (WSD). We question the assumption that the knowledge acquisition bottleneck, which is known...
This paper presents the details of a pilot study in which we tagged portions of the American National Corpus (ANC) for idioms composed of verb-noun constructions, prepositional ph...
Laura Street, Nathan Michalov, Rachel Silverstein,...
We study a generalization of the k-median problem with respect to an arbitrary dissimilarity measure D. Given a finite set P, our goal is to find a set C of size k such that the s...
High throughput biotechnologies have enabled scientists to collect a large number of genetic and phenotypic attributes for a large collection of samples. Computational methods are...
Using gene expression data to classify tumor types is a very promising tool in cancer diagnosis. Previous works show several pairs of tumor types can be successfully distinguished...
Chen-Hsiang Yeang, Sridhar Ramaswamy, Pablo Tamayo...