We introduce NPIC, an image classification system that focuses on synthetic (e.g., non-photographic) images. We use class-specific keywords in an image search engine to create a no...
Unrestricted information flows are a key security weakness of current web design. Cross-site scripting, cross-site request forgery, and other attacks typically require that inform...
Terri Oda, Glenn Wurster, Paul C. van Oorschot, An...
Statistical machine learning techniques for data classification usually assume that all entities are i.i.d. (independent and identically distributed). However, real-world entities...
Web Page segmentation is a crucial step for many applications in Information Retrieval, such as text classification, de-duplication and full-text search. In this paper we describe...
We introduce a multi-stage ensemble framework, ErrorDriven Generalist+Expert or Edge, for improved classification on large-scale text categorization problems. Edge first trains a ...