In this paper, we describe a new approach for mining concept associations from large text collections. The concepts are short sequences of words that occur frequently together acr...
We consider classification of email messages as to whether or not they contain certain “email acts”, such as a request or a commitment. We show that exploiting the sequential ...
This paper describes a method of detecting Japanese Katakana variants from a large corpus. Katakana words, which are mainly used as loanwords, cause problems with information retr...
There has recently been a significant increase in the number of community-based question and answer services on the Web where people answer other peoples’ questions. These serv...
We demonstrate the usefulness of the uniform resource locator (URL) alone in performing web page classification. This approach is magnitudes faster than typical web page classific...