Given only the URL of a web page, can we identify its topic? This is the question that we examine in this paper. Usually, web pages are classified using their content [7], but a U...
For extracting the characteristics a specific geographic entity, and notably a place, we propose to use dynamic Extreme Tagging Systems in combination with the classic approach of...
We consider the problem of indexing high-dimensional data for answering (approximate) similarity-search queries. Similarity indexes prove to be important in a wide variety of sett...
Web pages contain clutter (such as ads, unnecessary images and extraneous links) around the body of an article, which distracts a user from actual content. Extraction of "use...
This paper presents a transaction-time HTTP server, called ? Apache that supports document versioning. A document often consists of a main file formatted in HTML or XML and severa...