Challenging the implicit reliance on document collections, this paper discusses the pros and cons of using query logs rather than document collections, as self-contained sources o...
This paper describes a rather simplistic method of unsupervised morphological analysis of words in an unknown language. All what is needed is a raw text corpus in the given langua...
Current Data Mining techniques usually do not have a mechanism to automatically infer semantic features inherent in the data being “mined”. The semantics are either injected i...
A sequential pattern in data mining is a finite series of elements such as A → B → C → D where A, B, C, and D are elements of the same domain. The mining of sequential patte...
Pak Chung Wong, Wendy Cowley, Harlan Foote, Elizab...
Feature selection for unsupervised tasks is particularly challenging, especially when dealing with text data. The increase in online documents and email communication creates a nee...
Nirmalie Wiratunga, Robert Lothian, Stewart Massie