This paper shares our experience in designing a web crawler that can download billions of pages using a single-server implementation and models its performance. We show that with ...
Inferring an appropriate DTD or XML Schema Definition (XSD) for a given collection of XML documents essentially reduces to learning deterministic regular expressions from sets of ...
Geert Jan Bex, Wouter Gelade, Frank Neven, Stijn V...
In this paper we present a new document representation model based on implicit user feedback obtained from search engine queries. The main objective of this model is to achieve be...
Recent interests on XML, Semantic Web, and Web ontology, among other topics, have sparked a renewed interest on graph-structured databases. A fundamental query on graphs is the re...
Social annotation has gained increasing popularity in many Web-based applications, leading to an emerging research area in text analysis and information retrieval. This paper is c...