Md. Abul Hassan Samee, Benoit G. Bruneau, Katherine S. Pollard
Open Access Published: January 16, 2019
PlumX Metrics
Highlights
• The ShapeMF algorithm works on DNA shape data, extending de novo motif discovery
• ShapeMF identifies shape motifs enriched in regions bound by transcription factors
• Many transcription factors have shape motifs in both in vivo and in vitro data
• Shape motifs may encode specificity that goes beyond the sequence motifs of a TF
Summary
DNA shape adds specificity to sequence motifs but has not been explored systematically outside this context. We hypothesized that DNA-binding proteins (DBPs) preferentially occupy DNA with specific structures (“shape motifs”) regardless of whether or not these correspond to high information content sequence motifs. We present ShapeMF, a Gibbs sampling algorithm that identifies de novo shape motifs. Using binding data from hundreds of in vivo and in vitro experiments, we show that most DBPs have shape motifs and can occupy these in the absence of sequence motifs. This “shape-only binding” is common for many DBPs and in regions co-bound by multiple DBPs. When shape and sequence motifs co-occur, they can be overlapping, flanking, or separated by consistent spacing. Finally, DBPs within the same protein family have different shape motifs, explaining their distinct genome-wide occupancy despite having similar sequence motifs. These results suggest that shape motifs not only complement sequence motifs but also facilitate recognition of DNA beyond conventionally defined sequence motifs.
Graphical Abstract
Keywords transcription factor, DNA binding protein, DNA shape, shape motif, sequence motif, algorithm, de novo shape motif discovery, shape specificity, shape-specific binding, shape-only binding, ChIP-Seq, HT-SELEX, Gibbs sampling
FREE PDF GRATIS: Cell Systems