Defining functional DNA elements in the human genome
Manolis Kellisa,b,1,2, Barbara Woldc,2, Michael P. Snyderd,2, Bradley E. Bernsteinb,e,f,2, Anshul Kundajea,b,3, Georgi K. Marinovc,3, Lucas D. Warda,b,3, Ewan Birneyg, Gregory E. Crawfordh, Job Dekkeri, Ian Dunhamg, Laura L. Elnitskij, Peggy J. Farnhamk, Elise A. Feingoldj, Mark Gersteinl, Morgan C. Giddingsm, David M. Gilbertn, Thomas R. Gingeraso, Eric D. Greenj, Roderic Guigop, Tim Hubbardq, Jim Kentr, Jason D. Liebs, Richard M. Myerst, Michael J. Pazinj, Bing Renu, John A. Stamatoyannopoulosv, Zhiping Wengi, Kevin P. Whitew, and Ross C. Hardisonx,1,2
Edited by Robert Haselkorn, University of Chicago, Chicago, IL, and approved January 29, 2014 (received for review October 16, 2013)
Abstract Authors & Info SIMetrics PDFPDF + SI
With the completion of the human genome sequence, attention turned to identifying and annotating its functional DNA elements. As a complement to genetic and comparative genomics approaches, the Encyclopedia of DNA Elements Project was launched to contribute maps of RNA transcripts, transcriptional regulator binding sites, and chromatin states in many cell types. The resulting genome-wide data reveal sites of biochemical activity with high positional resolution and cell type specificity that facilitate studies of gene regulation and interpretation of noncoding variants associated with human disease. However, the biochemically active regions cover a much larger fraction of the genome than do evolutionarily conserved regions, raising the question of whether nonconserved but biochemically active regions are truly functional. Here, we review the strengths and limitations of biochemical, evolutionary, and genetic approaches for defining functional DNA segments, potential sources for the observed differences in estimated genomic coverage, and the biological implications of these discrepancies. We also analyze the relationship between signal intensity, genomic coverage, and evolutionary conservation. Our results reinforce the principle that each approach provides complementary information and that we need to use combinations of all three to elucidate genome function in human biology and disease.
1To whom correspondence may be addressed. E-mail: firstname.lastname@example.org or email@example.com.
2M.K., B.W., M.P.S., B.E.B., and R.C.H. contributed equally to this work.
3A.K., G.K.M., and L.D.W. contributed equally to this work.
Author contributions: M.K., B.W., M.P.S., B.E.B., and R.C.H. designed research; M.K., B.W., M.P.S., B.E.B., A.K., G.K.M., L.D.W., and R.C.H. performed research; A.K., G.K.M., and L.D.W. contributed computational analysis and tools; M.K., B.W., M.P.S., B.E.B., E.B., G.E.C., J.D., I.D., L.L.E., P.J.F., E.A.F., M.G., M.C.G., D.M.G., T.R.G., E.D.G., R.G., T.H., J.K., J.D.L., R.M.M., M.J.P., B.R., J.A.S., Z.W., K.P.W., and R.C.H. contributed to manuscript discussions and ideas; and M.K., B.W., M.P.S., B.E.B., and R.C.H. wrote the paper.
The authors declare no conflict of interest.
This article is a PNAS Direct Submission.
Data deposition: In addition to data already released via the ENCODE Data Coordinating Center, the erythroblast DNase-seq data reported in this paper have been deposited in the Gene Expression Omnibus (GEO) database, www.ncbi.nlm.nih.gov/geo (accession nos. GSE55579, GSM1339559, and GSM1339560).
Authored by members of the ENCODE Consortium.
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1318948111/-/DCSupplemental.
FREE PDF GRATIS: PNAS
FREE PDF GRATIS: PNAS