Comparison of discriminative motif optimization using matrix and DNA shape-based models

Título

Comparison of discriminative motif optimization using matrix and DNA shape-based models

Autor

Shuxiang Ruan, Gary D Stormo

Descripción

Abstract Background Transcription factor (TF) binding site specificity is commonly represented by some form of matrix model in which the positions in the binding site are assumed to contribute independently to the site’s activity. The independence assumption is known to be an approximation, often a good one but sometimes poor. Alternative approaches have been developed that use k-mers (DNA “words” of length k) to account for the non-independence, and more recently DNA structural parameters have been incorporated into the models. ChIP-seq data are often used to assess the discriminatory power of motifs and to compare different models. However, to measure the improvement due to using more complex models, one must compare to optimized matrix models. Results We describe a program “Discriminative Additive Model Optimization” (DAMO) that uses positive and negative examples, as in ChIP-seq data, and finds the additive position weight matrix (PWM) that maximizes the Area Under the Receiver Operating Characteristic Curve (AUROC). We compare to a recent study where structural parameters, serving as features in a gradient boosting classifier algorithm, are shown to improve the AUROC over JASPAR position frequency matrices (PFMs). In agreement with the previous results, we find that adding structural parameters gives the largest improvement, but most of the gain can be obtained by an optimized PWM and nearly all of the gain can be obtained with a di-nucleotide extension to the PWM. Conclusion To appropriately compare different models for TF bind sites, optimized models must be used. PWMs and their extensions are good representations of binding specificity for most TFs, and more complex models, including the incorporation of DNA shape features and gradient boosting classifiers, provide only moderate improvements for a few TFs.

Fecha

2018

Materia

motif, Motif optimization, ChIP-seq, position weight matrix, DNA shape features

Identificador

DOI: 10.1186/s12859-018-2104-7

Fuente

BMC Bioinformatics

Editor

BMC

Cobertura

Biology (General), Computer applications to medicine. Medical informatics

Idioma

EN

Archivos

https://socictopen.socict.org/files/to_import/pdfs/article 2216.pdf

Colección

Citación

Shuxiang Ruan, Gary D Stormo, “Comparison of discriminative motif optimization using matrix and DNA shape-based models,” SOCICT Open, consulta 18 de abril de 2026, https://socictopen.socict.org/items/show/2160.

Formatos de Salida

Position: 5881 (32 views)