— Many information retrieval and machine learning methods have not evolved in order to be applied to the Web. Two main problems in applying some machine learning techniques for W...
In this paper, we proposed a new approach, called FiVaTech for the problem of Web data extraction. FiVaTech is a page-level data extraction system which deduces the data schema an...
Mohammed Kayed, Chia-Hui Chang, Khaled F. Shaalan,...
Abstract. Distributed crawling has shown that it can overcome important limitations of the centralized crawling paradigm. However, the distributed nature of current distributed cra...
Extremists’ exploitation of computer-mediated communications such as online forums has recently gained much attention from academia and the government. However, due to the cover...
Large-scale Parallel Web Search Engines (WSEs) needs to adopt a strategy for partitioning the inverted index among a set of parallel server nodes. In this paper we are interested ...