https://socictopen.socict.org/files/original/75999d80ff694a1b3b435c2914e1b964.pdf 806fbcf11945cb199daa4ae9c7f0607d Dublin Core The Dublin Core metadata element set is common to all Omeka records, including items, files, and collections. For more information see, http://dublincore.org/documents/dces/. Title A name given to the resource Coronavirus Description An account of the resource Dominio científico: Coronavirus Text A resource consisting primarily of words for reading. Examples include books, letters, dissertations, poems, newspapers, articles, archives of mailing lists. Note that facsimiles or images of texts are still of the genre Text. Dublin Core The Dublin Core metadata element set is common to all Omeka records, including items, files, and collections. For more information see, http://dublincore.org/documents/dces/. Title A name given to the resource Efficient counting of <b><it>k</it></b>-mers in DNA sequences using a bloom filter Creator An entity primarily responsible for making the resource Melsted Páll, Pritchard Jonathan K Description An account of the resource Abstract Background Counting k-mers (substrings of length k in DNA sequence data) is an essential component of many methods in bioinformatics, including for genome and transcriptome assembly, for metagenomic sequencing, and for error correction of sequence reads. Although simple in principle, counting k-mers in large modern sequence data sets can easily overwhelm the memory capacity of standard computers. In current data sets, a large fraction-often more than 50%-of the storage capacity may be spent on storing k-mers that contain sequencing errors and which are typically observed only a single time in the data. These singleton k-mers are uninformative for many algorithms without some kind of error correction. Results We present a new method that identifies all the k-mers that occur more than once in a DNA sequence data set. Our method does this using a Bloom filter, a probabilistic data structure that stores all the observed k-mers implicitly in memory with greatly reduced memory requirements. We then make a second sweep through the data to provide exact counts of all nonunique k-mers. For example data sets, we report up to 50% savings in memory usage compared to current software, with modest costs in computational speed. This approach may reduce memory requirements for any algorithm that starts by counting k-mers in sequence data with errors. Conclusions A reference implementation for this methodology, BFCounter, is written in C++ and is GPL licensed. It is available for free download at http://pritch.bsd.uchicago.edu/bfcounter.html Date A point or period of time associated with an event in the lifecycle of the resource 2011 Identifier An unambiguous reference to the resource within a given context DOI: 10.1186/1471-2105-12-333 Source A related resource from which the described resource is derived BMC Bioinformatics Publisher An entity responsible for making the resource available BMC Coverage The spatial or temporal topic of the resource, the spatial applicability of the resource, or the jurisdiction under which the resource is relevant Biology (General), Computer applications to medicine. Medical informatics Language A language of the resource EN