The number of patent documents is currently rising rapidly worldwide, creating the need for an automatic categorization system to replace time-consuming and labor-intensive manual...
– We describe a method to extract content text from diverse Web pages by using the HTML document’s Text-to-Tag Ratio rather than specific HTML cues that may not be constant acr...
Abstract. In this paper a rigorous mathematical framework of deterministic annealing and mean-field approximation is presented for a general class of partitioning, clustering and ...
Abstract. This paper describes an efficient method to construct reliable machine learning applications in peer-to-peer (P2P) networks by building ensemble based meta methods. We co...
Abstract. Metagenomics is the study of microbial communities sampled directly from their natural environment, without prior culturing. Among the computational tools recently develo...