Implicit information embedded in semantic web graphs, such as topography, clusters, and disconnected subgraphs is difficult to extract from text files. Visualizations of the graph...
Hierarchies provide a means of organizing, summarizing and accessing information. We describe a method for automatically generating hierarchies from small collections of text, and...
We propose a novel approach that identifies web page templates and extracts the unstructured data. Extracting only the body of the page and eliminating the template increases the ...
Abstract. The goal of the INEX 2009 Book Track is to evaluate approaches for supporting users in reading, searching, and navigating the full texts of digitized books. The investiga...
Gabriella Kazai, Antoine Doucet, Marijn Koolen, Mo...
Most queries to text search engines are ranked or Boolean. Phrase querying is a powerful technique for refining searches, but is expensive to implement on conventional indexes. I...