Previous research has taught us that the typical nonprofessional information seeker on the World Wide Web submits very short queries resulting in low-precision results. We show th...
Summarization of text documents is increasingly important with the amount of data available on the Internet. The large majority of current approaches view documents as linear sequ...
Current peer-to-peer (p2p) full-text keyword search techniques fall into the following categories: document-based partitioning, keyword-based partitioning, hybrid indexing, and se...
Three join algorithms are evaluated in an environment with distributed main-memory based mediators and data sources. A streamed ship-out join ships bulks of tuples to a mediator ne...
We describe ongoing research on segmenting and labeling HTML medical journal articles. In contrast to existing approaches in which HTML tags usually serve as strong indicators, we...