In this paper we study how we can design an effective parallel crawler. As the size of the Web grows, it becomes imperative to parallelize a crawling process, in order to finish d...
Abstract—In this work, we investigate the relative hardness of shorttext corpora in clustering problems and how this hardness relates to traditional similarity measures. Our appr...
Marcelo Luis Errecalde, Diego Ingaramo, Paolo Ross...
We propose an unsupervised method for detecting spam documents from Web page data, based on equivalence relations on strings. We propose 3 measures for quantifying the alienness (...
The SWRLTab is a development environment for working with SWRL rules in Protégé-OWL. It supports the editing and execution of SWRL rules. It also provides mechanisms to allow int...
Martin J. O'Connor, Samson W. Tu, Csongor Nyulas, ...
During software evolution a collection of related artifacts with different representations are created. Some of these are composed of structured data (e.g., analysis data), some c...
Andrian Marcus, Andrea De Lucia, Jane Huffman Haye...