A binary search approach to whole-genome data analysis
Leonid Brodsky a,1, Simon Kogan a, Eshel BenJacob b, and Eviatar Nevo a,1
-Author Affiliations
aInstitute of Evolution, University of Haifa, Mount Carmel, Haifa 31905, Israel; and
bSchool of Physics and Astronomy, Tel Aviv University, Tel Aviv 69978, Israel
Contributed by Eviatar Nevo, August 2, 2010 (sent for review February 6, 2010)
Abstract
A sequence analysis-oriented binary search-like algorithm was transformed to a sensitive and accurate analysis tool for processing whole-genome data. The advantage of the algorithm over previous methods is its ability to detect the margins of both short and long genome fragments, enriched by up-regulated signals, at equal accuracy. The score of an enriched genome fragment reflects the difference between the actual concentration of up-regulated signals in the fragment and the chromosome signal baseline. The “divide-and-conquer”-type algorithm detects a series of nonintersecting fragments of various lengths with locally optimal scores. The procedure is applied to detected fragments in a nested manner by recalculating the lower-than-baseline signals in the chromosome. The algorithm was applied to simulated whole-genome data, and its sensitivity/specificity were compared with those of several alternative algorithms. The algorithm was also tested with four biological tiling array datasets comprising Arabidopsis (i) expression and (ii) histone 3 lysine 27 trimethylation CHIP-on-chip datasets; Saccharomyces cerevisiae (iii) spliced intron data and (iv) chromatin remodeling factor binding sites. The analyses’ results demonstrate the power of the algorithm in identifying both the short up-regulated fragments (such as exons and transcription factor binding sites) and the long—even moderately up-regulated zones—at their precise genome margins. The algorithm generates an accurate whole-genome landscape that could be used for cross-comparison of signals across the same genome in evolutionary and general genomic studies.
genome segmentation tiling array next-generation sequencing
Footnotes
1To whom correspondence may be addressed. E-mail:lbrodsky@research.haifa.ac.il or nevo@research.haifa.ac.il.
Author contributions: L.B. and E.N. designed research; L.B. and S.K. performed research; L.B., S.K., E.B., and E.N. analyzed data; and L.B., E.B., and E.N. wrote the paper.
The authors declare no conflict of interest.
This article contains supporting information online at
+++++
PDF gratuito deste artigo aqui.
+++++