https://socictopen.socict.org/files/original/f6fc09b803a3e9a7358a71f4f3d4b713.pdf 531c77d74f699c6f7571df467b25d2be Dublin Core The Dublin Core metadata element set is common to all Omeka records, including items, files, and collections. For more information see, http://dublincore.org/documents/dces/. Title A name given to the resource Coronavirus Description An account of the resource Dominio científico: Coronavirus Text A resource consisting primarily of words for reading. Examples include books, letters, dissertations, poems, newspapers, articles, archives of mailing lists. Note that facsimiles or images of texts are still of the genre Text. Dublin Core The Dublin Core metadata element set is common to all Omeka records, including items, files, and collections. For more information see, http://dublincore.org/documents/dces/. Title A name given to the resource Better quality score compression through sequence-based quality smoothing Creator An entity primarily responsible for making the resource Yoshihiro Shibuya, Matteo Comin Description An account of the resource Abstract Motivation Current NGS techniques are becoming exponentially cheaper. As a result, there is an exponential growth of genomic data unfortunately not followed by an exponential growth of storage, leading to the necessity of compression. Most of the entropy of NGS data lies in the quality values associated to each read. Those values are often more diversified than necessary. Because of that, many tools such as Quartz or GeneCodeq, try to change (smooth) quality scores in order to improve compressibility without altering the important information they carry for downstream analysis like SNP calling. Results We use the FM-Index, a type of compressed suffix array, to reduce the storage requirements of a dictionary of k-mers and an effective smoothing algorithm to maintain high precision for SNP calling pipelines, while reducing quality scores entropy. We present YALFF (Yet Another Lossy Fastq Filter), a tool for quality scores compression by smoothing leading to improved compressibility of FASTQ files. The succinct k-mers dictionary allows YALFF to run on consumer computers with only 5.7 GB of available free RAM. YALFF smoothing algorithm can improve genotyping accuracy while using less resources. Availability https://github.com/yhhshb/yalff Date A point or period of time associated with an event in the lifecycle of the resource 2019 Subject The topic of the resource FASTQ compression, BWT, FM-Index Identifier An unambiguous reference to the resource within a given context DOI: 10.1186/s12859-019-2883-5 Source A related resource from which the described resource is derived BMC Bioinformatics Publisher An entity responsible for making the resource available BMC Coverage The spatial or temporal topic of the resource, the spatial applicability of the resource, or the jurisdiction under which the resource is relevant Biology (General), Computer applications to medicine. Medical informatics Language A language of the resource EN