We propose a novel approach to find aliases of a given name from the web. We exploit a set of known names and their aliases as training data and extract lexical patterns that conv...
This paper proposes a method of crawling Web servers connected to the Internet without imposing a high processing load. We are using the crawler for a field survey of the digital ...
Katsuko T. Nakahira, Tetsuya Hoshino, Yoshiki Mika...
We propose an unsupervised method for detecting spam documents from Web page data, based on equivalence relations on strings. We propose 3 measures for quantifying the alienness (...
With the vast amount of potential relevant documents on the Web, a key question for a retrieval system is how to achieve a high accuracy retrieval under current Web setting. The w...
Measuring the information retrieval effectiveness of Web search engines can be expensive if human relevance judgments are required to evaluate search results. Using implicit user ...