Sciweavers

4651 search results - page 215 / 931
» A Data Quality Browser
Sort
View
ICDAR
1997
IEEE
15 years 10 months ago
Representing OCRed documents in HTML
ABSTRACT: OCR is an error-prone process. It is time-consuming and expensive to manually proofread OCR results. The errors remaining in OCRed texts can cause serious problems in rea...
Tao Hong, Sargur N. Srihari
CIKM
2008
Springer
15 years 8 months ago
Predicting web spam with HTTP session information
Web spam is a widely-recognized threat to the quality and security of the Web. Web spam pages pollute search engine indexes, burden Web crawlers and Web mining services, and expos...
Steve Webb, James Caverlee, Calton Pu
196
Voted
DOCENG
2010
ACM
15 years 7 months ago
Contextual advertising for web article printing
: Contextual Advertising for Web Article Printing Shengwen Yang, Jianming Jin, Parag Joshi, Sam Liu HP Laboratories HPL-2010-79 printed ad, web printing, article extraction, conte...
Shengwen Yang, Jianming Jin, Joshi Parag, Sam Liu
ML
2007
ACM
130views Machine Learning» more  ML 2007»
15 years 6 months ago
Interactive learning of node selecting tree transducer
We develop new algorithms for learning monadic node selection queries in unranked trees from annotated examples, and apply them to visually interactive Web information extraction. ...
Julien Carme, Rémi Gilleron, Aurélie...
ACSAC
2010
IEEE
15 years 4 months ago
Cujo: efficient detection and prevention of drive-by-download attacks
The JavaScript language is a core component of active and dynamic web content in the Internet today. Besides its great success in enhancing web applications, however, JavaScript p...
Konrad Rieck, Tammo Krueger, Andreas Dewald