Documents often contain inherently many concepts reflecting specific and generic aspects. To automatically generate a short summary text of documents on similar topics, it is im...
Information extraction (IE) holds the promise of generating a large-scale knowledge base from the Web’s natural language text. Knowledge-based weak supervision, using structured...
Raphael Hoffmann, Congle Zhang, Xiao Ling, Luke S....
Probabilistic modelling of text data in the bagof-words representation has been dominated by directed graphical models such as pLSI, LDA, NMF, and discrete PCA. Recently, state of...
We describe experiments with a Naive Bayes text classifier in the context of anti-spam E-mail filtering, using two different statistical event models: a multi-variate Bernoulli ...
Language modeling is an effective and theoretically attractive probabilistic framework for text information retrieval. The basic idea of this approach is to estimate a language mo...