A large fraction of the URLs on the web contain duplicate (or near-duplicate) content. De-duping URLs is an extremely important problem for search engines, since all the principal...
Contextual advertising on web pages has become very popular recently and it poses its own set of unique text mining challenges. Often advertisers wish to either target (or avoid) ...
Yi Zhang, Arun C. Surendran, John C. Platt, Mukund...
Advances in imaging techniques have led to large repositories of images. There is an increasing demand for automated systems that can analyze complex medical images and extract me...
Discovering a representative set of theme patterns from a large amount of text for interpreting their meaning has always been concerned by researches of both data mining and inform...
Yongxin Tong, Shilong Ma, Dan Yu, Yuanyuan Zhang, ...
Programs usually follow many implicit programming rules, most of which are too tedious to be documented by programmers. When these rules are violated by programmers who are unawar...