Vocabulary restrictions in large vocabulary continuous speech recognition (LVCSR) systems mean that out-of-vocabulary (OOV) words are lost in the output. However, OOV words tend t...
Carolina Parada, Abhinav Sethy, Mark Dredze, Frede...
We describe Thresher, a system that lets non-technical users teach their browsers how to extract semantic web content from HTML documents on the World Wide Web. Users specify exam...
This work aims a two-fold contribution: it presents a software to analyse logfiles and visualize popular web hot spots and, additionally, presents an algorithm to use this informa...
D. Avramouli, John D. Garofalakis, Dimitris J. Kav...
Web page classification is important to many tasks in information retrieval and web mining. However, applying traditional textual classifiers on web data often produces unsatisfyi...
We define an operational semantics and a type system for manipulating semistructured data that contains hidden information. The data model is simple labeled trees with a hiding op...