Traditionally, machine learning approaches for information extraction require human annotated data that can be costly and time-consuming to produce. However, in many cases, there ...
Many applications in text processing require significant human effort for either labeling large document collections (when learning statistical models) or extrapolating rules from...
— Category Ranking is a variant of the multi-label classification problem, in which, rather than performing a (hard) assignment to an object of categories from a predefined set...
We present a scalable algorithm for the parallel computation of inverted files for large text collections. The algorithm takes into account an environment of a high bandwidth netw...
Berthier A. Ribeiro-Neto, Joao Paulo Kitajima, Gon...
We introduce a multi-stage ensemble framework, ErrorDriven Generalist+Expert or Edge, for improved classification on large-scale text categorization problems. Edge first trains a ...