A sobreposição de genes de mamíferos: a perspectiva comparativa

quinta-feira, dezembro 02, 2010

Mammalian Overlapping Genes: The Comparative Perspective

Vamsi Veeramachaneni1,2, Wojciech Makalowski1,2, Michal Galdzicki4, Raman Sood4, and Izabela Makalowska2,3,5

-Author Affiliations

1Institute of Molecular Evolutionary Genetics
2Department of Biology
3The HuckInstitute of the Life Sciences, Pennsylvania State University, State College, University Park, Pennsylvania 16802, USA
4National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA

Abstract

It is believed that 3.2 billion bp of the human genome harbor ∼35,000 protein-coding genes. On average, one could expect one gene per 300,000 nucleotides (nt). Although the distribution of the genes in the human genome is not random,it is rather surprising that a large number of genes overlap in the mammalian genomes. Thousands of overlapping genes were recently identified in the human and mouse genomes. However,the origin and evolution of overlapping genes are still unknown. We identified 1316 pairs of overlapping genes in humans and mice and studied their evolutionary patterns. It appears that these genes do not demonstrate greater than usual conservation. Studies of the gene structure and overlap pattern showed that only a small fraction of analyzed genes preserved exactly the same pattern in both organisms.

Overlapping genes occur frequently in viral and cellular prokaryotic genomes as well as in organelles such as mitochondria (Normark et al. 1983). Until recently, it was believed that they occur much less frequently in eukaryotic nuclear genomes. Although their presence in human and other species' genomes was reported previously (Williams and Fried 1986; Lazar et al. 1989; Miyajima et al. 1989; Burke et al. 1998; Cooper et al. 1998; Bachman et al. 1999; Shintani et al. 1999; Misener and Walker 2000; Morelli et al. 2000; Slavov et al. 2000; Zhuo et al. 2001), until lately very little was known about their frequency and genome-wide distribution. Recent reports show that overlapping genes occur relatively frequently in human and other mammalian genomes (Kiyosawa and Abe 2002;Lehner et al. 2002; Okazaki et al. 2002; Shendure and Church 2002; Yelin et al. 2003). Nevertheless, there is still little known about the origin, evolution, or cross-species conservation of overlapping genes.

Shintani et al. (1999)suggested that the overlap between two genes studied by them, ACAT2 and TCP1, arose during the transition from therapsid reptiles to mammals, and that the overlap could have happened in one of two ways. In one scenario, the rearrangement may have been accompanied by the loss of a part of the 3′-untranslated region (UTR), including the polyadenylation signal, from one gene. By chance, however, the 3′-UTR of the new neighbor on the opposite strand contained all the signals necessary for termination and transcription so that the translocated gene could continue to function. Alternatively, the two genes became neighbors through the rearrangement but at first did not overlap. Later, one of the genes lost its original polyadenylation signal, but was able to use a signal that happened to be present on the noncoding strand of the other gene. Keese and Gibbs (1992)suspect that overlapping genes arise as a result of overprinting—a process of generating new genes from preexisting nucleotide sequences. However, both studies were done based on a single pair of eukaryotic overlapping genes. The hypothesis by Shintani et al. (1999)can only be applied to those overlapping gene pairs in which the overlap occurs at the 3′-end and does not include coding sequences. The hypothesis by Keese and Gibbs needs to be confirmed by larger studies. Interestingly, in both studies the time of origin of the gene overlaps was estimated to take place after the divergence of mammals from birds.

As suggested by Miyata and Yasunaga (1978), the rate of evolution can be expected to be slower in overlapping genes. This is in agreement with a study by Lipman (1997) in which the higher rate of conservation of noncoding sequences of some genes is explained by the presence of antisense transcripts. However, there is not enough experimental evidence that higher conservation is a common feature of coding and noncoding overlapping genes. Shintani et al. (1999)found high 3′-UTR conservation in only one of two studied overlapping genes, and Svaren et al. (1997)found only one area with higher conservation in 3′-UTRs of overlapping Stat6 and Nab2 genes, and, even then, the authors expect this partial conservation to be due to some additional regulatory functions and not necessarily due to the overlap between the genes.

Here we report a study of 774 overlapping genes in human and 542 overlapping gene pairs in mouse as well as analysis of 778 human and mouse orthologous genes that, in at least one species, share exons with another gene.
Previous SectionNext Section
RESULTS
Identification of Overlapping Genes

We used the NCBI human genome assembly Build 33 (April 2003)and the mouse genome assembly Build 30 (March 2003)as the sequence source for identification of overlapping genes. Out of 34,604 genes annotated in the human genome, we identified 774 pairs of overlapping genes, and of 33,936 analyzed genes in the mouse genome, we identified 578 pairs of overlapping genes. We focused on annotated genomic sequence genes only and did not include ESTs to get high-quality data for mouse-human comparison. As shown by other studies (Wolfsberg and Landsman 1997)as well as our work in the early stages of this study, EST sequences can be identified as overlapping because of chimeric sequences, mislabeling, and genomic sequence contamination. More than 10% of such identified, overlapping genes are artifacts (Yelin et al. 2003). Because chimeric sequences can also be found among annotated mRNAs (Lehner et al. 2002), we used genomic localization to confirm the presence of gene overlaps in our study. Our earlier studies showed that simple presence of regions of complementarities between mRNAs is not sufficient and can lead to false-positive results. Because we wanted to compare human and mouse protein-coding genes, we excluded from our search noncoding genes, which in the genome scale are involved in ∼75% of gene overlaps (Kiyosawa et al. 2003).

As shown in Table 1, among 774 overlapping protein coding genes in the human genome, 542 had overlapping exons. In 299 pairs of genes with overlapping exons, coding sequence was involved, and in 57 cases, coding sequences from both genes are coded by the same genomic fragment. From all human overlapping genes, 53% had tail-to-tail overlap (3′ to 3′), 30.23% showed head-to-head overlap (5′ to 5′), and 16.28% represented embedded genes. In the mouse genome, we found 578 pairs of protein-coding overlapping genes, and 455 of these pairs had overlapping exons. Of these, 232 pairs of genes with overlapping exons had coding sequence involved, and among these 31 pairs showed overlap between coding sequences of both genes. In mouse, 54.32% of genes overlapped at the 3′-ends, 36.51% of overlaps were head-to-head overlaps, and 9.17% of the gene pairs had one gene embedded into another. The fraction of gene pairs overlapping at the 5′-ends is significantly higher than previously reported by Shendure and Lehner, who found that only 5.53% (Shendure and Church 2002)and 15% (Lehner et al. 2002)of overlapping genes had head-to-head orientation. However, results similar to ours were presented by Yelin et al. (2003), who found that 31% of identified human overlapping genes overlap at the 5′-end. We also found 18 cases in the human genome and eight in mouse where one gene had exons overlapping with exons of not one but two different genes. These genes represent previously unreported triplets of overlapping genes. Table 2 lists all cases of such overlapping triplets. An example with three human overlapping genes— MUTYH, TOE1, and TESK2—is presented in Figure 1. The gene TOE1 has overlapping exons with MUTYH at the 5′-end and with TESK2 at the 3′-end. In the human genome we also found a segment with four exon overlapping genes:LOC338549, IDI2, HT009, and IDI1.

+++++