This paper describes an exploratory, qualitative study of a process for extracting, identifying and exploiting an enterprise's implicit (less visible) web communities using l...
Wikipedia is the largest monolithic repository of human knowledge. In addition to its sheer size, it represents a new encyclopedic paradigm by interconnecting articles through hyp...
This paper identifies and explores the problem of seed selection in a web-scale crawler. We argue that seed selection is not a trivial but very important problem. Selecting proper...
There have been recent improvements in document technologies like the standardization of object interfaces to access and manipulate the properties of web documents. There has also...
Inverted index structures are the mainstay of modern text retrieval systems. They can be constructed quickly using off-line mergebased methods, and provide efficient support for ...