This paper presents a new web mining scheme for parallel data acquisition. Based on the Document Object Model (DOM), a web page is represented as a DOM tree. Then a DOM tree align...
Abstract. Collaborative Web Search (CWS) proposes a case-based approach to personalizing search results for the needs of a community of like-minded searchers. The search activities...
Parallel corpora are a valuable resource for tasks such as cross-language information retrieval and data-driven natural language processing systems. Previously only small scale cor...
In this work we compare different techniques to automatically find candidate web pages to substitute broken links. We extract information from the anchor text, the content of the p...
Since the website is one of the most important organizational structures of the Web, how to effectively rank websites has been essential to many Web applications, such as Web sear...