Uma abordagem de pesquisa binária para análise de dados do genoma completo

segunda-feira, setembro 13, 2010

A binary search approach to whole-genome data analysis

Leonid Brodsky a,1, Simon Kogan a, Eshel BenJacob b, and Eviatar Nevo a,1

-Author Affiliations

aInstitute of Evolution, University of Haifa, Mount Carmel, Haifa 31905, Israel; and

bSchool of Physics and Astronomy, Tel Aviv University, Tel Aviv 69978, Israel

Contributed by Eviatar Nevo, August 2, 2010 (sent for review February 6, 2010)

Abstract

A sequence analysis-oriented binary search-like algorithm was transformed to a sensitive and accurate analysis tool for processing whole-genome data. The advantage of the algorithm over previous methods is its ability to detect the margins of both short and long genome fragments, enriched by up-regulated signals, at equal accuracy. The score of an enriched genome fragment reflects the difference between the actual concentration of up-regulated signals in the fragment and the chromosome signal baseline. The “divide-and-conquer”-type algorithm detects a series of nonintersecting fragments of various lengths with locally optimal scores. The procedure is applied to detected fragments in a nested manner by recalculating the lower-than-baseline signals in the chromosome. The algorithm was applied to simulated whole-genome data, and its sensitivity/specificity were compared with those of several alternative algorithms. The algorithm was also tested with four biological tiling array datasets comprising Arabidopsis (i) expression and (ii) histone 3 lysine 27 trimethylation CHIP-on-chip datasets; Saccharomyces cerevisiae (iii) spliced intron data and (iv) chromatin remodeling factor binding sites. The analyses’ results demonstrate the power of the algorithm in identifying both the short up-regulated fragments (such as exons and transcription factor binding sites) and the long—even moderately up-regulated zones—at their precise genome margins. The algorithm generates an accurate whole-genome landscape that could be used for cross-comparison of signals across the same genome in evolutionary and general genomic studies.

genome segmentation   tiling array   next-generation sequencing  

Footnotes

1To whom correspondence may be addressed. E-mail:lbrodsky@research.haifa.ac.il or nevo@research.haifa.ac.il.

Author contributions: L.B. and E.N. designed research; L.B. and S.K. performed research; L.B., S.K., E.B., and E.N. analyzed data; and L.B., E.B., and E.N. wrote the paper.

The authors declare no conflict of interest.

This article contains supporting information online at


+++++

PDF gratuito deste artigo aqui.

+++++