The information revolution is creating and publishing vast data sets, such as records of business transactions, environmental statistics and census demographics. In many applicati...
This paper investigates the pre-conditions for successful combination of document representations formed from structural markup for the task of known-item search. As this task is ...
Web pages contain clutter (such as ads, unnecessary images and extraneous links) around the body of an article, which distracts a user from actual content. Extraction of "use...
We consider the application of machine learning techniques for sequence modeling to Information Retrieval (IR) and surface Information Extraction (IE) tasks. We introduce a generi...
Massih-Reza Amini, Hugo Zaragoza, Patrick Gallinar...
In this paper, we propose a machine learning approach to title extraction from general documents. By general documents, we mean documents that can belong to any one of a number of...
Yunhua Hu, Hang Li, Yunbo Cao, Dmitriy Meyerzon, Q...