Better quality score compression through sequence-based quality smoothing

Título

Better quality score compression through sequence-based quality smoothing

Autor

Yoshihiro Shibuya, Matteo Comin

Descripción

Abstract Motivation Current NGS techniques are becoming exponentially cheaper. As a result, there is an exponential growth of genomic data unfortunately not followed by an exponential growth of storage, leading to the necessity of compression. Most of the entropy of NGS data lies in the quality values associated to each read. Those values are often more diversified than necessary. Because of that, many tools such as Quartz or GeneCodeq, try to change (smooth) quality scores in order to improve compressibility without altering the important information they carry for downstream analysis like SNP calling. Results We use the FM-Index, a type of compressed suffix array, to reduce the storage requirements of a dictionary of k-mers and an effective smoothing algorithm to maintain high precision for SNP calling pipelines, while reducing quality scores entropy. We present YALFF (Yet Another Lossy Fastq Filter), a tool for quality scores compression by smoothing leading to improved compressibility of FASTQ files. The succinct k-mers dictionary allows YALFF to run on consumer computers with only 5.7 GB of available free RAM. YALFF smoothing algorithm can improve genotyping accuracy while using less resources. Availability https://github.com/yhhshb/yalff

Fecha

2019

Materia

FASTQ compression, BWT, FM-Index

Identificador

DOI: 10.1186/s12859-019-2883-5

Fuente

BMC Bioinformatics

Editor

BMC

Cobertura

Biology (General), Computer applications to medicine. Medical informatics

Idioma

EN

Archivos

https://socictopen.socict.org/files/to_import/pdfs/article 1228.pdf

Colección

Citación

Yoshihiro Shibuya, Matteo Comin, “Better quality score compression through sequence-based quality smoothing,” SOCICT Open, consulta 20 de abril de 2026, https://socictopen.socict.org/items/show/1189.

Formatos de Salida

Position: 12095 (23 views)