As online document collections continue to expand, both on the Web and in proprietary environments, the need for duplicate detection becomes more critical. The goal of this work i...
To support the rapid growth of the web and e-commerce, W3C developed DOM as an application programming interface that the abstract, logical tree structure of an XML document. In t...
Many text documents naturally have two kinds of labels. For example, we may label web pages from universities according to their categories, such as "student" or "fa...
Having seen a news title "Alba denies wedding reports", how do we infer that it is primarily about Jessica Alba, rather than about weddings or reports? We probably reali...
With more and more reviews on the web, browsing through a mass of the related reviews becomes a heavy work. How to effectively analyzing and organizing these reviews attracts more...
Shu Zhang, Wen-Jie Jia, Yingju Xia, Yao Meng, Hao ...