https://socictopen.socict.org/files/original/97df682f7e212e67d4a9224a9c75f4d0.pdf 68654b24f0c3aa07861e4134c4300324 Dublin Core The Dublin Core metadata element set is common to all Omeka records, including items, files, and collections. For more information see, http://dublincore.org/documents/dces/. Title A name given to the resource Coronavirus Description An account of the resource Dominio científico: Coronavirus Text A resource consisting primarily of words for reading. Examples include books, letters, dissertations, poems, newspapers, articles, archives of mailing lists. Note that facsimiles or images of texts are still of the genre Text. Dublin Core The Dublin Core metadata element set is common to all Omeka records, including items, files, and collections. For more information see, http://dublincore.org/documents/dces/. Title A name given to the resource Estimating the total genome length of a metagenomic sample using k-mers Creator An entity primarily responsible for making the resource Kui Hua, Xuegong Zhang Description An account of the resource Abstract Background Metagenomic sequencing is a powerful technology for studying the mixture of microbes or the microbiomes on human and in the environment. One basic task of analyzing metagenomic data is to identify the component genomes in the community. This task is challenging due to the complexity of microbiome composition, limited availability of known reference genomes, and usually insufficient sequencing coverage. Results As an initial step toward understanding the complete composition of a metagenomic sample, we studied the problem of estimating the total length of all distinct component genomes in a metagenomic sample. We showed that this problem can be solved by estimating the total number of distinct k-mers in all the metagenomic sequencing data. We proposed a method for this estimation based on the sequencing coverage distribution of observed k-mers, and introduced a k-mer redundancy index (KRI) to fill in the gap between the count of distinct k-mers and the total genome length. We showed the effectiveness of the proposed method on a set of carefully designed simulation data corresponding to multiple situations of true metagenomic data. Results on real data indicate that the uncaptured genomic information can vary dramatically across metagenomic samples, with the potential to mislead downstream analyses. Conclusions We proposed the question of how long the total genome length of all different species in a microbial community is and introduced a method to answer it. Date A point or period of time associated with an event in the lifecycle of the resource 2019 Subject The topic of the resource metagenomics, Sequencing coverage, Distinct k-mers, genome length Identifier An unambiguous reference to the resource within a given context DOI: 10.1186/s12864-019-5467-x Source A related resource from which the described resource is derived BMC Genomics Publisher An entity responsible for making the resource available BMC Coverage The spatial or temporal topic of the resource, the spatial applicability of the resource, or the jurisdiction under which the resource is relevant Genetics, Biotechnology Language A language of the resource EN