Duplicate URLs have brought serious troubles to the whole pipeline of a search engine, from crawling, indexing, to result serving. URL normalization is to transform duplicate URLs...
Tao Lei, Rui Cai, Jiang-Ming Yang, Yan Ke, Xiaodon...
Typical approaches for querying structured Web Data collect (crawl) and pre-process (index) large amounts of data in a central data repository before allowing for query answering....
Andreas Harth, Katja Hose, Marcel Karnstedt, Axel ...
We investigate the degree to which modern web browsers are subject to “device fingerprinting” via the version and configuration information that they will transmit to website...
Scaling up document-image classifiers to handle an unlimited variety of document and image types poses serious challenges to conventional trainable classifier technologies. Highly...
Most database systems allow query processing over attributes that are derived at query runtime (e.g., user-defined functions and remote data calls to web services), making them e...
Justin J. Levandoski, Mohamed F. Mokbel, Mohamed E...