In this paper, we propose a new approach to discover informative contents from a set of tabular documents (or Web pages) of a Web site. Our system, InfoDiscoverer, first partition...
The page rank of a commercial web site has an enormous economic impact because it directly influences the number of potential customers that find the site as a highly ranked sear...
Web page classification is important to many tasks in information retrieval and web mining. However, applying traditional textual classifiers on web data often produces unsatisfyi...
This paper describes the results of an observational study into the methods people use to manage web information for re-use. People observed in our study used a diversity of metho...
Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrele...