Construção ab initio de um transcriptome eucariótico

quarta-feira, março 04, 2009

Ab initio construction of a eukaryotic transcriptome by massively parallel mRNA sequencing

1. Moran Yassoura,b,1,
2. Tommy Kaplana,c,1,
3. Hunter B. Fraserb,
4. Joshua Z. Levinb,
5. Jenna Pfiffnerb,
6. Xian Adiconisb,
7. Gary Schrothd,
8. Shujun Luod,
9. Irina Khrebtukovad,
10. Andreas Gnirkeb,
11. Chad Nusbaumb,
12. Dawn-Anne Thompsonb,
13. Nir Friedmana,2 and
14. Aviv Regevb,e,2

Author Affiliations

1. aSchool of Computer Science and Engineering, The Hebrew University, Jerusalem, 91904, Israel;

2. bBroad Institute of Massachusetts Institute of Technology and Harvard, 7 Cambridge Center, Cambridge, MA 02142;

3. cDepartment of Molecular Genetics and Biotechnology, Faculty of Medicine, The Hebrew University, Jerusalem 91120, Israel;

4. dIllumina, Inc., 25861 Industrial Boulevard, Hayward, CA 94545; and

5. eDepartment of Biology, Massachusetts Institute of Technology, Cambridge, MA 02142

1. Communicated by Eric S. Lander, The Broad Institute, Cambridge, MA, December 18, 2008

2. ↵1M.Y. and T.K. contributed equally to this work. (received for review October 14, 2008)

Abstract

Defining the transcriptome, the repertoire of transcribed regions encoded in the genome, is a challenging experimental task. Current approaches, relying on sequencing of ESTs or cDNA libraries, are expensive and labor-intensive. Here, we present a general approach for ab initio discovery of the complete transcriptome of the budding yeast, based only on the unannotated genome sequence and millions of short reads from a single massively parallel sequencing run. Using novel algorithms, we automatically construct a highly accurate transcript catalog. Our approach automatically and fully defines 86% of the genes expressed under the given conditions, and discovers 160 previously undescribed transcription units of 250 bp or longer. It correctly demarcates the 5′ and 3′ UTR boundaries of 86 and 77% of expressed genes, respectively. The method further identifies 83% of known splice junctions in expressed genes, and discovers 25 previously uncharacterized introns, including 2 cases of condition-dependent intron retention. Our framework is applicable to poorly understood organisms, and can lead to greater understanding of the transcribed elements in an explored genome.

Footnotes

2To whom correspondence may be addressed. E-mail: nir@cs.huji.ac.il oraregev@broad.mit.edu

Author contributions: M.Y., T.K., H.B.F., J.Z.L., C.N., D.-A.T., N.F., and A.R. designed research; M.Y., T.K., H.B.F., J.Z.L., J.P., X.A., D.-A.T., N.F., and A.R. performed research; M.Y., T.K., J.Z.L., J.P., X.A., G.S., S.L., I.K., A.G., C.N., D.-A.T., and N.F. contributed new reagents/analytic tools; M.Y., T.K., and N.F. analyzed data; and M.Y., T.K., N.F., and A.R. wrote the paper.

This article contains supporting information online at PNAS.

Freely available online through the PNAS open access option. [PDF gratuito]