http://socictopen.socict.org/files/original/a50f0de867579b92e1380b184f4e972d.pdf b4b19a39a4e0ac1be0e6297580cecd9f Dublin Core The Dublin Core metadata element set is common to all Omeka records, including items, files, and collections. For more information see, http://dublincore.org/documents/dces/. Title A name given to the resource Coronavirus Description An account of the resource Dominio científico: Coronavirus Text A resource consisting primarily of words for reading. Examples include books, letters, dissertations, poems, newspapers, articles, archives of mailing lists. Note that facsimiles or images of texts are still of the genre Text. Dublin Core The Dublin Core metadata element set is common to all Omeka records, including items, files, and collections. For more information see, http://dublincore.org/documents/dces/. Title A name given to the resource Reference-free validation of short read data. Creator An entity primarily responsible for making the resource Jan Schröder, James Bailey, Thomas Conway, Justin Zobel Description An account of the resource BACKGROUND: High-throughput DNA sequencing techniques offer the ability to rapidly and cheaply sequence material such as whole genomes. However, the short-read data produced by these techniques can be biased or compromised at several stages in the sequencing process; the sources and properties of some of these biases are not always known. Accurate assessment of bias is required for experimental quality control, genome assembly, and interpretation of coverage results. An additional challenge is that, for new genomes or material from an unidentified source, there may be no reference available against which the reads can be checked. RESULTS: We propose analytical methods for identifying biases in a collection of short reads, without recourse to a reference. These, in conjunction with existing approaches, comprise a methodology that can be used to quantify the quality of a set of reads. Our methods involve use of three different measures: analysis of base calls; analysis of k-mers; and analysis of distributions of k-mers. We apply our methodology to wide range of short read data and show that, surprisingly, strong biases appear to be present. These include gross overrepresentation of some poly-base sequences, per-position biases towards some bases, and apparent preferences for some starting positions over others. CONCLUSIONS: The existence of biases in short read data is known, but they appear to be greater and more diverse than identified in previous literature. Statistical analysis of a set of short reads can help identify issues prior to assembly or resequencing, and should help guide chemical or statistical methods for bias rectification. Date A point or period of time associated with an event in the lifecycle of the resource 2010 Identifier An unambiguous reference to the resource within a given context DOI: 10.1371/journal.pone.0012681 Source A related resource from which the described resource is derived PLoS ONE Publisher An entity responsible for making the resource available Public Library of Science (PLoS) Coverage The spatial or temporal topic of the resource, the spatial applicability of the resource, or the jurisdiction under which the resource is relevant Science, Medicine Language A language of the resource EN