— As person names are non-unique, the same name on different Web pages might or might not refer to the same real-world person. This entity identification problem is one of the m...
Abstract: As web sites are getting more complicated, the construction of web information extraction systems becomes more troublesome and time-consuming. A common theme is the diffi...
: To date, one of the main aims of the World Wide Web has been to provide users with information. In addition to private homepages, large professional information providers, includ...
In a traditional information retrieval system, it is assumed that queries can be posed about any topic. In reality, a large fraction of web queries are posed about a relatively sm...
: Search engines--"web dragons"--are the portals through which we access society's treasure trove of information. They do not publish the algorithms they use to sort...