This paper explores correspondence and mixture topic modeling of documents tagged from two different perspectives. There has been ongoing work in topic modeling of documents with...
We propose a semantic tagger that provides high level concept information for phrases based on several kinds of low level information about words in clinical narrative texts. The ...
To circumvent spam filters, many spammers attempt to obfuscate their emails by deliberately misspelling words or introducing other errors into the text. For example viagra may be...
In this paper we bring to light a novel intersection between corpus linguistics and behavioral data that can be employed as an evaluation metric for resources for low-density lang...
It is well known that occurrence counts of words in documents are often modeled poorly by standard distributions like the binomial or Poisson. Observed counts vary more than simpl...