Abstract. Controlled vocabularies of various kinds (e.g., thesauri, classification schemes) play an integral part in making Cultural Heritage collections accessible. The various in...
In this paper, we sketch a method for clustering e-commerce search engines by the type of products/services they sell. This method utilizes the special features of interface pages...
In this paper, we describe how a large webmail service uses reputation to classify authenticated sending domains as either spammy or not spammy. Both SPF and DomainKey authenticat...
The Web is increasingly understood as a global information space consisting not just of linked documents, but also of Linked Data. More than just a vision, the resulting Web of Da...
Christian Bizer, Tom Heath, Kingsley Idehen, Tim B...
We discuss the problem of Web data extraction and describe an XML-based methodology whose goal extends far beyond simple "screen scraping." An ideal data extraction proc...