Estimating the size of the bacterial pan-genome
Pascal Lapierre1 and J. Peter Gogarten2
1 University of Connecticut Biotechnology Center, 91 North Eagleville Road, Storrs, CT 06269-3149, USA
2 Department of Molecular and Cell Biology, University of Connecticut, 91 North Eagleville Road, Storrs, CT 06269-3125, USA
Abstract
The ‘pan-genome’ denotes the set of all genes present in the genomes of a group of organisms. Here, we extend the pan-genome concept to higher taxonomic units. Using 573 sequenced genomes, we estimate the size of the bacterial pan-genome based on the frequency of occurrences of genes among sampled genomes. Using gene- and genome-centered approaches, we characterize three distinct pools of gene families that comprise the bacterial pan-genome, each evolving under different evolutionary constraints. Our findings indicate that the pan-genome of the bacterial domain is of infinite size (the Bacteria as a whole have an open pan-genome) and that 250 genes per genome belong to the extended bacterial core genome.
+++++