A Árvore da Vida revisitada: está mais para gramado...

segunda-feira, março 21, 2011

Stalking the Fourth Domain in Metagenomic Data: Searching for, Discovering, and Interpreting Novel, Deep Branches in Marker Gene Phylogenetic Trees

Dongying Wu1, Martin Wu1,4, Aaron Halpern2,3, Douglas B. Rusch2,3, Shibu Yooseph2,3, Marvin Frazier2,3, J. Craig Venter2,3, Jonathan A. Eisen1*

1 Department of Evolution and Ecology, Department of Medical Microbiology and Immunology, University of California Davis Genome Center, University of California Davis, Davis, California, United States of America, 2 The J. Craig Venter Institute, Rockville, Maryland, United States of America, 3 The J. Craig Venter Institute, La Jolla, California, United States of America, 4 University of Virginia, Charlottesville, Virginia, United States of America



Most of our knowledge about the ancient evolutionary history of organisms has been derived from data associated with specific known organisms (i.e., organisms that we can study directly such as plants, metazoans, and culturable microbes). Recently, however, a new source of data for such studies has arrived: DNA sequence data generated directly from environmental samples. Such metagenomic data has enormous potential in a variety of areas including, as we argue here, in studies of very early events in the evolution of gene families and of species.

Methodology/Principal Findings

We designed and implemented new methods for analyzing metagenomic data and used them to search the Global Ocean Sampling (GOS) Expedition data set for novel lineages in three gene families commonly used in phylogenetic studies of known and unknown organisms: small subunit rRNA and the recA and rpoB superfamilies. Though the methods available could not accurately identify very deeply branched ss-rRNAs (largely due to difficulties in making robust sequence alignments for novel rRNA fragments), our analysis revealed the existence of multiple novel branches in the recA and rpoB gene families. Analysis of available sequence data likely from the same genomes as these novel recA and rpoB homologs was then used to further characterize the possible organismal source of the novel sequences.


Of the novel recA and rpoB homologs identified in the metagenomic data, some likely come from uncharacterized viruses while others may represent ancient paralogs not yet seen in any cultured organism. A third possibility is that some come from novel cellular lineages that are only distantly related to any organisms for which sequence data is currently available.1 If there exist any major, but so-far-undiscovered, deeply branching lineages in the tree of life, we suggest that methods such as those described herein currently offer the best way to search for them.

Citation: Wu D, Wu M, Halpern A, Rusch DB, Yooseph S, et al. (2011) Stalking the Fourth Domain in Metagenomic Data: Searching for, Discovering, and Interpreting Novel, Deep Branches in Marker Gene Phylogenetic Trees. PLoS ONE 6(3): e18011. doi:10.1371/journal.pone.0018011

Editor: Robert Fleischer, Smithsonian Institution National Zoological Park, United States of America

Received: October 25, 2010; Accepted: February 20, 2011; Published: March 18, 2011

This is an open-access article distributed under the terms of the Creative Commons Public Domain declaration which stipulates that, once placed in the public domain, this work may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose.

Funding: The development and main work on this project was supported by the National Science Foundation via an “Assembling the Tree of Life” grant (number 0228651) to to Jonathan A. Eisen and Naomi Ward. The final work on this project was funded by the Gordon and Betty Moore Foundation (through grants 0000951 and 0001660). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript

Competing interests: The authors have declared that no competing interests exist.