In order to search within corpora written in two or more languages, the simplest and most effective approach is to translate the submitted request into the required language(s). To...
In a federated digital library system, it is too expensive to query every accessible library. Resource selection is the task to decide to which libraries a query should be routed....
Consistent and flawless communication between humans and machines is the precondition for a computer to process instructions correctly. While machines use well-defined languages an...
Information filtering has made considerable progress in recent years.The predominant approaches are content-based methods and collaborative methods. Researchers have largely conc...
We present Content Extraction via Tag Ratios (CETR) – a method to extract content text from diverse webpages by using the HTML document’s tag ratios. We describe how to comput...