We present GoGetIt!, a tool for generating structure-driven crawlers that requires a minimum effort from the users. The tool takes as input a sample page and an entry point to a W...
Altigran Soares da Silva, Edleno Silva de Moura, J...
Web pages contain a combination of unique content and template material, which is present across multiple pages and used primarily for formatting, navigation, and branding. We stu...
Clustering is the problem of identifying the distribution of patterns and intrinsic correlations in large data sets by partitioning the data points into similarity classes. This p...
We introduce a new variant of the popular Burrows-Wheeler transform (BWT) called Geometric Burrows-Wheeler Transform (GBWT). Unlike BWT, which merely permutes the text, GBWT conve...
For categorical data there does not exist any similarity measure which is as straight forward and general as the numerical distance between numerical items. Due to this it is ofte...