'Dark Matter' of the Genome Revealed Through Analysis of 29 Mammals
ScienceDaily (Oct. 12, 2011) — An international team of researchers has discovered the vast majority of the so-called "dark matter" in the human genome, by means of a sweeping comparison of 29 mammalian genomes. The team, led by scientists from the Broad Institute, has pinpointed the parts of the human genome that control when and where genes are turned on. This map is a critical step in interpreting the thousands of genetic changes that have been linked to human disease.
Rendering of DNA. Researchers have discovered the vast majority of the so-called "dark matter" in the human genome, by means of a sweeping comparison of 29 mammalian genomes. (Credit: iStockphoto/Martin McCarthy)
Their findings appear online October 12 in the journal Nature.
Early comparison studies of the human and mouse genomes led to the surprising discovery that the regulatory information that controls genes dwarfs the information in the genes themselves. But, these studies were indirect: they could infer the existence of these regulatory sequences, but could find only a small fraction of them. These mysterious sequences have been referred to as the dark matter of the genome, analogous to the unseen matter and energy that make up most of the universe.
This new study enlisted a menagerie of mammals -- including rabbit, bat, elephant, and more -- to reveal these mysterious genomic elements.
Over the last five years, the Broad Institute, the Genome Institute at Washington University, and the Baylor College of Medicine Human Genome Sequencing Center have sequenced the genomes of 29 placental mammals. The research team compared all of these genomes, 20 of which are first reported in this paper, looking for regions that remained largely unchanged across species.
"With just a few species, we didn't have the power to pinpoint individual regions of regulatory control," said Manolis Kellis, last author of the study and associate professor of computer science at MIT. "This new map reveals almost 3 million previously undetectable elements in non-coding regions that have been carefully preserved across all mammals, and whose disruptions appear to be associated with human disease."
These findings could yield a deeper understanding of disease-focused studies, which look for genetic variants closely tied to disease.
"Most of the genetic variants associated with common diseases occur in non-protein coding regions of the genome. In these regions, it is often difficult to find the causal mutation," said first author Kerstin Lindblad-Toh, scientific director of vertebrate genome biology at the Broad and a professor in comparative genomics at Uppsala University, Sweden. "This catalog will make it easier to decipher the function of disease-related variation in the human genome."
This new map helps pinpoint those mutations that are likely responsible for disease, as they have been preserved across millions of years of evolution, but are commonly disrupted in individuals that suffer from a given disease. Knowing the causal mutations and their likely functions can then help uncover the underlying disease mechanisms and reveal potential drug targets.
The scientists were able to suggest possible functions for more than half of the 360 million DNA letters contained in the conserved elements, revealing the hidden meaning behind the As, Cs, Ts, and Gs. These revealed:
Almost 4,000 previously undetected exons, or segments of DNA that code for protein
10,000 highly conserved elements that may be involved in how proteins are made
More than 1,000 new families of RNA secondary structures with diverse roles in gene regulation
2.7 million predicted targets of transcription factors, proteins that control gene expression
...
Read more here/Leia mais aqui: Science Daily
Read more here/Leia mais aqui: Science Daily
+++++
Nature
A high-resolution map of human evolutionary constraint using 29 mammals
Kerstin Lindblad-Toh et al
Affiliations
Contributions
Corresponding authors
Nature (2011)
doi:10.1038/nature10530
Abstract
The comparison of related genomes has emerged as a powerful lens for genome interpretation. Here we report the sequencing and comparative analysis of 29 eutherian genomes. We confirm that at least 5.5% of the human genome has undergone purifying selection, and locate constrained elements covering ~4.2% of the genome. We use evolutionary signatures and comparisons with experimental data sets to suggest candidate functions for ~60% of constrained bases. These elements reveal a small number of new coding exons, candidate stop codon readthrough events and over 10,000 regions of overlapping synonymous constraint within protein-coding exons. We find 220 candidate RNA structural families, and nearly a million elements overlapping potential promoter, enhancer and insulator regions. We report specific amino acid residues that have undergone positive selection, 280,000 non-coding elements exapted from mobile elements and more than 1,000 primate- and human-accelerated elements. Overlap with disease-associated variants indicates that our findings will be relevant for studies of human biology, health and disease.
+++++