We have produced a system that automatically incorporates syndicated materials from sources including library acquisition records and online news sites to form growing hypertextual...
Pratik Dave, Luis Francisco-Revilla, Unmil Karadka...
In this paper, we propose a machine learning approach to title extraction from general documents. By general documents, we mean documents that can belong to any one of a number of...
Yunhua Hu, Hang Li, Yunbo Cao, Dmitriy Meyerzon, Q...
Search queries are typically very short, which means they are often underspecified or have senses that the user did not think of. A broad latent query aspect is a set of keywords ...
It is often highly valuable for organizations to have their data analyzed by external agents. However, any program that computes on potentially sensitive data risks leaking inform...
With the exponential growth of Web 2.0 applications, tags have been used extensively to describe the image contents on the Web. Due to the noisy and sparse nature in the human gene...