Search Sciweavers | Sciweavers

311 search results - page 10 / 63

» Cleaning Web Pages for Effective Web Content Mining

228

click to vote

ACSW
2004

192views Security Privacy» more ACSW 2004»

Discovering Parallel Text from the World Wide Web

15 years 8 months ago

Download crpit.com

Parallel corpus is a rich linguistic resource for various multilingual text management tasks, including crosslingual text retrieval, multilingual computational linguistics and mul...

Jisong Chen, Rowena Chau, Chung-Hsing Yeh

claim paper

Read More »

194

click to vote

DOCENG
2009
ACM

139views Document Analysis» more DOCENG 2009»

Web document text and images extraction using DOM analysis and natural language processing

16 years 1 months ago

Download www.hpl.hp.com

: © Web Document Text and Images Extraction using DOM Analysis and Natural Language Processing Parag Mulendra Joshi, Sam Liu HP Laboratories HPL-2009-187 Web page text extraction,...

Parag Mulendra Joshi, Sam Liu

claim paper

Read More »

191

click to vote

COLCOM
2008
IEEE

121views Distributed And Parallel Com...» more COLCOM 2008»

Web Canary: A Virtualized Web Browser to Support Large-Scale Silent Collaboration in Detecting Malicious Web Sites

15 years 9 months ago

Download mason.gmu.edu

Abstract. Malicious Web content poses a serious threat to the Internet, organizations and users. Current approaches to detecting malicious Web content employ high-powered honey cli...

Jiang Wang, Anup K. Ghosh, Yih Huang

claim paper

Read More »

195

click to vote

ICDAR
2003
IEEE

127views Document Analysis» more ICDAR 2003»

Identifying Story and Preview Images in News Web Pages

16 years 22 days ago

Download www.cse.salford.ac.uk

The World Wide Web provides an increasingly powerful and popular publication mechanism. Web documents often contain a large number of images serving various different purposes. Th...

Jianying Hu, Amit Bagga

claim paper

Read More »

195

click to vote

KDD
2006
ACM

185views Data Mining» more KDD 2006»

Understanding Content Reuse on the Web: Static and Dynamic Analyses

16 years 7 months ago

Download homepages.dcc.ufmg.br

Abstract. In this paper we present static and dynamic studies of duplicate and near-duplicate documents in the Web. The static and dynamic studies involve the analysis of similar c...

Ricardo A. Baeza-Yates, Álvaro R. Pereira J...

claim paper

Read More »

« Prev « First page 10 / 63 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers