All information exchange on the Internet ? whether through full text, controlled vocabularies, ontologies, or other mechanisms ? ultimately requires that that an information provi...
Hierarchical topic taxonomies have proliferated on the World Wide Web [5, 18], and exploiting the output space decompositions they induce in automated classification systems is an...
Linear Support Vector Machines (SVMs) have become one of the most prominent machine learning techniques for highdimensional sparse data commonly encountered in applications like t...
Unsupervised clustering can be significantly improved using supervision in the form of pairwise constraints, i.e., pairs of instances labeled as belonging to same or different clu...
Automated detection of the first document reporting each new event in temporally-sequenced streams of documents is an open challenge. In this paper we propose a new approach which...
Yiming Yang, Jian Zhang, Jaime G. Carbonell, Chun ...