Many modern natural language-processing applications utilize search engines to locate large numbers of Web documents or to compute statistics over the Web corpus. Yet Web search e...
In this paper, we describe a methodology to estimate the geographic coverage of the web without the need for secondary knowledge or complex geo-tagging. This is achieved by random...
Robert Pasley, Paul Clough, Ross S. Purves, Floria...
Literature databases, scientific journals and com munication between researchers on the electronic level are rapidly developing tools in mathematics having high impact on the daily...
The retrieval of similar documents in the Web from a given document is different in many aspects from information retrieval based on queries generated by regular search engine use...
Felipe Bravo-Marquez, Gaston L'Huillier, Sebasti&a...
Information extraction (IE) from semi-structured Web documents is a critical issue for information integration systems on the Internet. Previous work in wrapper induction aim to so...