Automated text categorization is an important technique for many web applications, such as document indexing, document filtering, and cataloging web resources. Many different appr...
On an abstract level, XML Schema increases the limited expressive power of Document Type Definitions (DTDs) by extending them with a recursive typing mechanism. However, an invest...
Geert Jan Bex, Wim Martens, Frank Neven, Thomas Sc...
With the falling price of the memory, an increasing number of multimedia servers and proxies are now equipped with a large memory space. Caching media objects in the memory of a p...
Traditionally, research in identifying structured entities in documents has proceeded independently of document categorization research. In this paper, we observe that these two t...
A large fraction of the URLs on the web contain duplicate (or near-duplicate) content. De-duping URLs is an extremely important problem for search engines, since all the principal...