Cientistas lançam esforços para sequenciar o DNA de 10.000 vertebrados

quarta-feira, novembro 04, 2009

Scientists Launch Effort to Sequence the DNA of 10,000 Vertebrates

Project Will Help Explain Evolutionary Mysteries

Scientists have an ambitious new strategy for untangling the evolutionary history of humans and their biological relatives: a genetic menagerie made of the DNA of more than 10,000 vertebrate species. The plan, proposed by an international consortium of scientists, is to obtain, preserve, and sequence the DNA of approximately one species for each genus of living mammals, birds, reptiles, amphibians, and fish.

“Understanding the evolution of the vertebrates is one of the greatest detective stories in science,” said David Haussler, a Howard Hughes Medical Institute investigator at the University of California, Santa Cruz (UCSC). “No one has ever really known how the elephant got its trunk, or how the leopard got its spots. This project will lay the foundation for work that will answer those questions and many others.”

+++++



“Understanding the evolution of the vertebrates is one of the greatest detective stories in science.” - David Haussler





+++++

Known as the Genome 10K Project, the approximately $50 million initiative is “tremendously exciting science that will have great benefits for human and animal health,” Haussler said. “Within our lifetimes, we could get a glimpse of the genetic changes that have given rise to some of the most diverse life forms on the planet.”

Haussler is one of the lead authors of an article, published online November 5, 2009, in the Journal of Heredity, that outlines the project. The other lead authors include Stephen J. O’Brien, chief of the Laboratory of Genomic Diversity at the National Cancer Institute, and Oliver A. Ryder, director of genetics at the San Diego Zoo’s Institute for Conservation Research and adjunct professor of biology at the University of California, San Diego. Coauthors and additional authors, who together make up a group called the Genome 10K Community of Scientists (G10KCOS), include geneticists, paleontologists, ecologists, conservationists, and other scientists representing major zoos, museums, research centers, and universities around the world.

The proposal originated at a meeting Haussler hosted at UCSC in April 2009. More than 50 scientists came together to discuss the merits of the project and its daunting logistic and financial challenges. “Some of the people at the meeting were initially skeptical,” Haussler said. “But they quickly recognized the many advantages of a shared infrastructure and data analysis system.”

The primary impetus behind the proposal is the rapidly expanding capability of DNA sequencers and the associated decline in sequencing costs. “We’ll soon be in a situation where it will cost only a few thousand dollars to sequence a genome,” Haussler said. “At that point, most of the cost will be getting samples, managing the project, and handling data.”

All living vertebrates descend from a single marine species that lived 500-600 million years ago. Paleontologists do not know much about the physical appearance of that species, but because all of its descendents share certain characteristics, they know that it had segmented muscles, a forebrain, midbrain, and hind brain attached to spinal cord structures, and a sophisticated innate immune system.

That primitive vertebrate gave rise to what Haussler calls “one of the most spectacularly malleable branches of life.” Vertebrates spread throughout the oceans, conquered land, and eventually took to the air. Over the course of time they produced stunning innovations, including multichambered hearts, bones and teeth, an internal skeleton that has supported the largest aquatic and terrestrial animals on the planet, and a species of primate -- Homo sapiens -- that has produced sophisticated language, culture, and technology.

By sequencing the DNA of 10,000 vertebrates -- roughly one-sixth of the 60,000 species estimated to be living today -- biologists will be able to reconstruct the genetic changes that gave rise to this astonishing diversity. Some parts of our DNA are very similar to the DNA of other vertebrates, reflecting our descent from a common ancestor, while other parts are markedly different. “We can understand the function of elements in the human genome by seeing what parts of the genome have changed and what parts have not changed in humans and other animals,” said Haussler.

The project also will help conservation efforts by documenting the genomes and genetic diversity of threatened and endangered vertebrate species. By helping scientists predict how species will respond to climate change, pollution, emerging diseases, and invasive competitors, it will support the assessment, monitoring, and management of biological diversity.

The G10KCOS consortium has been developing guidelines for the collection, preservation, and documentation of cell lines and DNA samples. It also has been discussing potential public and private sources of funding for the project -- estimated at $50 million if the price of handling and sequencing each DNA sample eventually falls to $5,000. Said Haussler: “How do you raise $50 million? Ask nicely and make a strong case.”

In planning the project, the G10KCOS group has used the Human Genome Project as a model. For example, the consortium plans to release sequencing data immediately according to standards developed for the sequencing of the human genome. Haussler also cited that project, which began before needed sequencing technologies were available, as evidence that it is worthwhile to begin planning for the Genome 10K Project before the cost of sequencing falls enough to make it feasible. “The time to start is now, or the job will get away from us,” said Haussler. “The sequencing machines will be waiting, but the samples won’t be ready.”

+++++

Source/Fonte

+++++

Algorithms for Genome Analysis

Summary:

David Haussler is developing new statistical and algorithmic methods to explore the molecular evolution of the human genome, integrating cross-species comparative and high-throughput genomics data to study gene structure, function, and regulation.

My genome informatics team has participated in the public consortium efforts to produce, assemble, and annotate the first mammalian genomes. As collaborators in the Human Genome Project, we built the program that assembled the first working draft of the human genome sequence from information produced by sequencing centers worldwide, and we participated in the informatics associated with the finishing effort. We provide an interactive genome browser for the human, mouse, rat, and other genomes that is used by thousands of biomedical researchers every day (genome.ucsc.edu). By integrating multiple sets of high-throughput genomics data, computational predictions, and curated genomic feature sets from dozens of laboratories, the browser provides a new kind of computational microscope for exploring genomes.

Our work developing and annotating genomes for the browser provides a foundation for our scientific efforts. These are directed at the large-scale discovery and characterization of the functional elements in mammalian genomes through comparative sequence analysis, the study of mammalian molecular evolution, and the integration of an increasing variety of high-throughput data sets provided by functional genomics efforts.

Throughout the approximately 75 million years since the human species diverged from its common ancestor with the rat and mouse, the three genomes have independently accumulated many changes, leading to the three different species we see today. Reconstructing these changes by computational analysis has given us a new understanding of mammalian genome evolution. In comparisons of the human, mouse, and rat genomes, we have found that the rate of neutral substitution varies regionally along the chromosomes. The mechanistic explanation of this variation has not yet been found. We determined that a core of about 40 percent of the human, rat, and mouse genome sequences derives from a common ancestor, and we produced base-level alignments between the three genomes in these regions. This alignment, combined with characterization of neutral substitution rates, led to the estimate that at least 5 percent of the human genome is under negative selection; changes to the bases in these regions reduce fitness, and hence seldom become established in the population.

We suspect that these conserved regions contain the most functionally important elements of the genome and point to areas where intensified study will lead to a better understanding of how the genome works. Since only 1.5 percent of the genome is coding, if this rough estimate holds up, it would imply that there is at least an additional 3.5 percent of the genome that is functionally important noncoding DNA. Some of these noncoding regions are "ultraconserved," showing almost no change for hundreds of millions of years. We have confirmed that negative selection is three times stronger in these regions than it is for nonsynonymous changes in coding regions. It is a mystery what molecular mechanisms would place virtually every base in a segment of size up to 1 kilobase under this level of negative selection. Our goal over the next several years is to characterize these regions computationally and in many cases also functionally, through wet-lab experiments.

In an attempt to build realistic and information-rich mathematical models of molecular evolution, we have undertaken larger, multispecies comparisons. Some of these models are tailored to specific kinds of functional elements, such as coding exons and transcription factor–binding sites (in conjunction with the National Human Genome Research Institute ENCODE project). These models should identify elements under negative selection with higher sensitivity and specificity than was possible with two-species comparisons. Ultimately we hope to explore the full spectrum of events in mammalian molecular evolution, including insertions, deletions, duplications, inversions, and rearrangements. As the number of genomes grows, our goal is to produce increasingly accurate analyses of the evolutionary history of each base in the human genome as a basis for genome-wide functional analysis.

Our work has revealed some unexpected origins for some ultraconserved elements. Multiple close copies of one of these critical DNA sequences in our genome can be traced to our common ancestor with the coelacanth, a descendant of the ancient marine organism that gave rise to the terrestrial vertebrates more than 360 million years ago. These sequences appear to derive from DNA elements known as retroposons, which are evolutionarily derived from retroviruses. In the coelacanth, the segments were produced by a retroposon known as a short interspersed repetitive element, or SINE, which is a piece of DNA that can make copies of itself and insert those copies elsewhere in an organism's genome. Wet-lab tests have confirmed that one of these segments regulates a nearby neurodevelopmental gene. Thus, the movement of retroposons can generate evolutionary experiments by adding new regulatory modules to genes, and for as yet unknown reasons, these can occasionally become ultraconserved.

Our other work has confirmed that this process of regulatory network expansion by retroposon movement is widespread. For example, we estimate that one-third of the binding sites for the tumor-suppressor gene p53 in our genome are specific to primates and were put in place through expansion of a particular family of endogenous retroviruses (a type of retroposon) about 40 million years ago. This significantly expanded the regulatory network of p53 in primates.

We have also begun to explore sudden change in noncoding regions of the genome that have previously been highly conserved by negative selection. Comparing our genome to that of our closest relative, the chimpanzee, we found the most dramatic example of evolutionary acceleration in a novel RNA gene that is expressed specifically in neurons in the developing human neocortex during a critical period for cortical neuron specification and migration. This and other regions of accelerated change in the human genome provide exciting new candidates in the search for uniquely human biology.

This work is funded in part by grants from the National Human Genome Research Institute, the National Cancer Institute, the National Institute on Drug Abuse, and the California Institute for Quantitative Biomedical Research (QB3).

Last updated August 21, 2008