Whole-proteome phylogeny of prokaryotes by feature frequency profiles: An alignment-free method with optimal feature resolution
Se-Ran Jun a, Gregory E. Sims a, Guohong A. Wu a and Sung-Hou Kim a,b,1
- Author Affiliations
aDepartment of Chemistry, University of California, Berkeley, CA 94720
bPhysical Biosciences Division, Lawrence Berkeley National Laboratory, Berkeley CA 94720
Contributed by Sung-Hou Kim, November 13, 2009 (sent for review October 6, 2009)
Abstract
We present a whole-proteome phylogeny of prokaryotes constructed by comparing feature frequency profiles (FFPs) of whole proteomes. Features are l-mers of amino acids, and each organism is represented by a profile of frequencies of all features. The selection of feature length is critical in the FFP method, and we have developed a procedure for identifying the optimal feature lengths for inferring the phylogeny of prokaryotes, strictly speaking, a proteome phylogeny. Our FFP trees are constructed with whole proteomes of 884 prokaryotes, 16 unicellular eukaryotes, and 2 random sequences. To highlight the branching order of major groups, we present a simplified proteome FFP tree of monophyletic class or phylum with branch support. In our whole-proteome FFP trees (i) Archaea, Bacteria, Eukaryota, and a random sequence outgroup are clearly separated; (ii) Archaea and Bacteria form a sister group when rooted with random sequences; (iii) Planctomycetes, which possesses an intracellular membrane compartment, is placed at the basal position of the Bacteria domain; (iv) almost all groups are monophyletic in prokaryotes at most taxonomic levels, but many differences in the branching order of major groups are observed between our proteome FFP tree and trees built with other methods; and (v) previously “unclassified” genomes may be assigned to the most likely taxa. We describe notable similarities and differences between our FFP trees and those based on other methods in grouping and phylogeny of prokaryotes.
branching order l-mers prokaryotic phylogeny random sequence outgroup whole-genome phylogeny
Footnotes
1To whom correspondence should be addressed. E-mail: SHKim@cchem.berkeley.edu.
Author contributions: S.-R.J., G.E.S., and S.-H.K. designed research; S.-R.J. performed research; S.-R.J., G.E.S., and G.A.W. contributed new reagents/analytic tools; S.-R.J., G.E.S., G.A.W., and S.-H.K. analyzed data; and S.-R.J., G.E.S., G.A.W., and S.-H.K. wrote the paper.
The authors declare no conflict of interest.
This article contains supporting information online at www.pnas.org/cgi/content/full/0913033107/DCSupplemental.
+++++
PDF gratuito do artigo aqui.