This paper replicates and extends Observed Trends in Spam Construction Techniques: A Case Study of Spam Evolution. A corpus of 169,274 spam email was collected over a period of fi...
Tokenization is one of the initial steps done for almost any text processing task. It is not particularly recognized as a challenging task for English monolingual systems but it r...
We present and partially evaluate procedures for the extraction of noun+verb collocation candidates from German text corpora, along with their morphosyntactic preferences, especia...
The Italian particle ne exhibits interesting anaphoric properties that have not been yet explored in depth from a corpus and computational linguistic perspective. We provide: (i) ...
The DAM-LR project aims at virtually integrating various European language resource archives that allow users to navigate and operate in a single unified domain of language resour...