While several hierarchical classification methods have been applied to web content, such techniques invariably rely on a pre-defined taxonomy of documents. We propose a new techni...
A large fraction of the useful web comprises of specification documents that largely consist of hattribute name, numeric valuei pairs embedded in text. Examples include product in...
MapReduce is emerging as an important programming model for large-scale data-parallel applications such as web indexing, data mining, and scientific simulation. Hadoop is an open-...
Matei Zaharia, Andy Konwinski, Anthony D. Joseph, ...
Query-based web search is an integral part of many people’s daily activities. Most do not realize that their search history can be used to identify them (and their interests). I...
Identifying highlights in multimedia content such as video and audio is currently a very difficult technical problem. We present and evaluate a novel algorithm that identifies hig...