Recent advances in compressed data structures have led to the new concept of self-indexing; it is possible to represent a sequence of symbols compressed in a form that enables fas...
We consider the problem of compressibility of protein sequences. Based on an observed genome-scale long-range correlation in concatenated protein sequences from different organism...
We consider the problem of PAC-learning distributions over strings, represented by probabilistic deterministic finite automata (PDFAs). PDFAs are a probabilistic model for the gen...
While developing lexical resources for a particular language variety (Viennese), we experimented with a set of 5 different phonetic encodings, termed phone sets, used for unit sel...
Abstract. Information systems are socio-technical systems. Their design, analysis and implementation requires appropriate languages for representing social and technical concepts. ...