Community QA portals provide an important resource for non-factoid question-answering. The inherent noisiness of user-generated data makes the identification of high-quality cont...
We propose a novel approach to find aliases of a given name from the web. We exploit a set of known names and their aliases as training data and extract lexical patterns that conv...
We consider the problem of deep web source selection and argue that existing source selection methods are inadequate as they are based on local similarity assessment. Specificall...
Recently a lot of work on integrating the search interfaces of multiple Web databases of the same domain into an integrated interface has been reported. Such integrated interfaces ...
We consider the problem of partitioning, in a highly accurate and highly efficient way, a set of n documents lying in a metric space into k non-overlapping clusters. We augment th...
Filippo Geraci, Marco Pellegrini, Paolo Pisati, Fa...