Optical character recognition (OCR) remains a difficult problem for noisy documents or documents not scanned at high resolution. Many current approaches rely on stored font models...
Andrew Kae, Gary Huang, Erik Learned-miller, Carl ...
In this paper, we propose a semi-supervised learning approach for classifying program (bot) generated web search traffic from that of genuine human users. The work is motivated by...
Hongwen Kang, Kuansan Wang, David Soukal, Fritz Be...
As sophisticated enterprise applications move to the Web, some advanced user experiences become difficult to migrate due to prohibitively high computation, memory, and bandwidth r...
Daniel Coffman, Danny Soroker, Chandra Narayanaswa...
The World-Wide Web consists not only of a huge number of unstructured texts, but also a vast amount of valuable structured data. Web tables [2] are a typical type of structured in...
Cindy Xide Lin, Bo Zhao, Tim Weninger, Jiawei Han,...
Micro-blogging services provide platforms for users to share their feelings and ideas on the go. Desiging to produce information stream in almost micro-blogging services, although...