Learning temporal graph structures from time series data reveals important dependency relationships between current observations and histories. Most previous work focuses on learn...
Data clustering is a difficult problem due to the complex and heterogeneous natures of multidimensional data. To improve clustering accuracy, we propose a scheme to capture the lo...
We improve on previous recommender systems by taking advantage of the layered structure of software. We use a random-walk approach, mimicking the more focused behavior of a develo...
Zachary M. Saul, Vladimir Filkov, Premkumar T. Dev...
Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrele...
The PDF format is commonly used for the exchange of documents on the Web and there is a growing need to understand and extract or repurpose data held in PDF documents. Many system...