The PDF format is commonly used for the exchange of documents on the Web and there is a growing need to understand and extract or repurpose data held in PDF documents. Many system...
While we expect to discover knowledge in the texts available on the Web, such discovery usually requires many complex analysis steps, most of which require different text handling...
It is increasingly common for users to interact with the web using a number of different aliases. This trend is a doubleedged sword. On one hand, it is a fundamental building bloc...
Among the vast numbers of images on the web are many duplicates and near-duplicates, that is, variants derived from the same original image. Such near-duplicates appear in many we...
Jun Jie Foo, Justin Zobel, Ranjan Sinha, Seyed M. ...
Microblogs have become an important source of information for the purpose of marketing, intelligence, and reputation management. Streams of microblogs are of great value because o...