Sciweavers

7495 search results - page 328 / 1499
» Intelligent Document Processing
Sort
View
IDA
2009
Springer
16 years 1 months ago
Two-Way Grouping by One-Way Topic Models
Abstract. We tackle the problem of new users or documents in collaborative filtering. Generalization over users by grouping them into user groups is beneficial when a rating is t...
Eerika Savia, Kai Puolamäki, Samuel Kaski
WWW
2005
ACM
16 years 8 days ago
Finding the boundaries of information resources on the web
In recent years, many algorithms for the Web have been developed that work with information units distinct from individual web pages. These include segments of web pages or aggreg...
Pavel Dmitriev, Carl Lagoze, Boris Suchkov
SIGIR
2010
ACM
15 years 10 months ago
Adaptive near-duplicate detection via similarity learning
In this paper, we present a novel near-duplicate document detection method that can easily be tuned for a particular domain. Our method represents each document as a real-valued s...
Hannaneh Hajishirzi, Wen-tau Yih, Aleksander Kolcz
SIGMOD
2003
ACM
140views Database» more  SIGMOD 2003»
16 years 6 months ago
Stream Processing of XPath Queries with Predicates
We consider the problem of evaluating large numbers of XPath filters, each with many predicates, on a stream of XML documents. The solution we propose is to lazily construct a sin...
Ashish Kumar Gupta, Dan Suciu
164
Voted
SADFE
2009
IEEE
16 years 1 months ago
Automating Disk Forensic Processing with SleuthKit, XML and Python
We have developed a program called fiwalk which produces detailed XML describing all of the partitions and files on a hard drive or disk image, as well as any extractable metadat...
Simson L. Garfinkel