An unsupervised clustering of the webpages on a website is a primary requirement for most wrapper induction and automated data extraction methods. Since page content can vary dras...
Matching regular expressions (regexps) is a very common workload. For example, tokenization, which consists of recognizing words or keywords in a character stream, appears in ever...
Max-margin Markov networks (M3 N) have shown great promise in structured prediction and relational learning. Due to the KKT conditions, the M3 N enjoys dual sparsity. However, the...
The de novo assembly of genomes from high-throughput short reads is an active area of research. Several promising methods have been recently developed, with applicability mainly re...
Benjamin G. Jackson, Patrick S. Schnable, Srinivas...
We start from the state-of-the-art Bag of Words pipeline that in the 2008 benchmarks of TRECvid and PASCAL yielded the best performance scores. We have contributed to that pipelin...
Jasper R. R. Uijlings, Arnold W. M. Smeulders, Rem...