Background: The most common substitution matrices currently used (BLOSUM and PAM) are based on protein sequences with average amino acid distributions, thus they do not represent ...
Background: Physical maps are the substrate of genome sequencing and map-based cloning and their construction relies on the accurate assembly of BAC clones into large contigs that...
Zeev Frenkel, Etienne Paux, David I. Mester, Cathe...
Abstract Clustering text data streams is an important issue in data mining community and has a number of applications such as news group filtering, text crawling, document organiza...
With the advent of XML as the de facto language for data publishing and exchange, scalable distribution of XML data to large, dynamic populations of consumers remains an important...
We present Content Extraction via Tag Ratios (CETR) – a method to extract content text from diverse webpages by using the HTML document’s tag ratios. We describe how to comput...