Shannon's Noisy-Channel model, which describes how a corrupted message might be reconstructed, has been the corner stone for much work in statistical language and speech proc...
Automatically segmenting unstructured text strings into structured records is necessary for importing the information contained in legacy sources and text collections into a data ...
Structured link vector model (SLVM) is a recently proposed document representation that takes into account both structural and semantic information for measuring XML document simi...
We consider the problem of eliminating redundant Boolean features for a given data set, where a feature is redundant if it separates the classes less well than another feature or ...
Annalisa Appice, Michelangelo Ceci, Simon Rawles, ...
We study model selection strategies based on penalized empirical loss minimization. We point out a tight relationship between error estimation and data-based complexity penalizatio...