Much existing research on blogs focused on posts only, ignoring their comments. Our user study conducted on summarizing blog posts, however, showed that reading comments does chan...
We propose a novel approach that identifies web page templates and extracts the unstructured data. Extracting only the body of the page and eliminating the template increases the ...
Extracting sentences that contain important information from a document is a form of text summarization. The technique is the key to the automatic generation of summaries similar ...
We report on a language resource consisting of 2000 annotated bibliography entries, which is being analyzed as part of our research on indicative document summarization. We show h...
This paper is concerned with automatic extraction of titles from the bodies of HTML documents. Titles of HTML documents should be correctly defined in the title fields; however, i...