We develop a novel framework for the page-level template detection problem. Our framework is built on two main ideas. The first is the automatic generation of training data for a ...
Current keyword-oriented search engines for the World Wide Web do not allow specifying the semantics of queries. We address this limitation with NAGA1 , a new semantic search engi...
Gjergji Kasneci, Fabian M. Suchanek, Maya Ramanath...
XML Schema documents are defined using an XML syntax, which means that the idea of generating schema documentation through standard XML technologies is intriguing. We present X2Do...
We address the problem of extracting semantics of tags ? short, unstructured text-labels assigned to resources on the Web ? based on each tag's metadata patterns. In particul...
In this short note we demonstrate the applicability of hyperlink downweighting by means of language model disagreement. The method filters out hyperlinks with no relevance to the ...