It has been observed that precision increases with collection size. One explanation could be that the redundancy of information increases, making it easier to find multiple docum...
Simulation optimization (SO) is the process of finding the optimum design of a system whose performance measure(s) are estimated via simulation. We propose some ideas to improve o...
There is an exploding amount of user-generated content on the Web due to the emergence of "Web 2.0" services, such as Blogger, MySpace, Flickr, and del.icio.us. The part...
Ka Cheung Sia, Junghoo Cho, Yun Chi, Belle L. Tsen...
The KNOWITALL system aims to automate the tedious process of extracting large collections of facts (e.g., names of scientists or politicians) from the Web in an unsupervised, doma...
Oren Etzioni, Michael J. Cafarella, Doug Downey, A...
Abstract. This paper describes a new way of implementing an intelligent web caching service, based on an analysis of usage. Since the cache size in software is limited, and the sea...