Os cientistas não sabem o que metade dos genes microbianos realmente fazem.

quarta-feira, maio 25, 2022

Unifying the known and unknown microbial coding sequence space

Chiara Vanni, Matthew S Schechter, Silvia G Acinas, Albert Barberán, Pier Luigi Buttigieg, Emilio O Casamayor, Tom O Delmont, Carlos M Duarte, A Murat Eren, Robert D Finn, Renzo Kottmann, Alex Mitchell, Pablo Sánchez, Kimmo Siren, Martin Steinegger, Frank Oliver Gloeckner, Antonio Fernàndez-Guerra Is a corresponding author see less

Microbial Genomics and Bioinformatics Research G, Max Planck Institute for Marine Microbiology, Germany; Jacobs University Bremen, Germany; Department of Medicine, University of Chicago, United States; Department of Marine Biology and Oceanography, Institut de Ciències del Mar (CSIC), Spain; Department of Environmental Science, University of Arizona, United States; Alfred Wegener Institute, Helmholtz Centre for Polar and Marine Research, Alfred Wegener Institute, Germany; Center for Advanced Studies of Blanes CEAB-CSIC, Spanish Council for Research, Spain; Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, France; Red Sea Research Centre and Computational Bioscience Research Center, King Abdullah University of Science and Technology, Saudi Arabia; Josephine Bay Paul Center, Marine Biological Laboratory, United States; European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, United Kingdom; Section for Evolutionary Genomics, The GLOBE Institute, University of Copenhagen, Denmark; School of Biological Sciences, Seoul National University, Republic of Korea; Institute of Molecular Biology and Genetics, Seoul National University, Republic of Korea; University of Bremen and Life Sciences and Chemistry, Germany; Computing Center, Helmholtz Center for Polar and Marine Research, Germany; Lundbeck Foundation GeoGenetics Centre, GLOBE Institute, University of Copenhagen, Denmark.

Mar 31, 2022 

https://doi.org/10.7554/eLife.67667



Abstract

Genes of unknown function are among the biggest challenges in molecular biology, especially in microbial systems, where 40–60% of the predicted genes are unknown. Despite previous attempts, systematic approaches to include the unknown fraction into analytical workflows are still lacking. Here, we present a conceptual framework, its translation into the computational workflow AGNOSTOS and a demonstration on how we can bridge the known-unknown gap in genomes and metagenomes. By analyzing 415,971,742 genes predicted from 1749 metagenomes and 28,941 bacterial and archaeal genomes, we quantify the extent of the unknown fraction, its diversity, and its relevance across multiple organisms and environments. The unknown sequence space is exceptionally diverse, phylogenetically more conserved than the known fraction and predominantly taxonomically restricted at the species level. From the 71 M genes identified to be of unknown function, we compiled a collection of 283,874 lineage-specific genes of unknown function for Cand. Patescibacteria (also known as Candidate Phyla Radiation, CPR), which provides a significant resource to expand our understanding of their unusual biology. Finally, by identifying a target gene of unknown function for antibiotic resistance, we demonstrate how we can enable the generation of hypotheses that can be used to augment experimental data.

Editor's evaluation

In this paper, the authors develop a sensitive and specific computational workflow for comprehensively summarizing known and unknown gene content across large collections of genomes and metagenomes. In addition to clustering and categorizing genes on a large scale, the authors show how to use their approach to both explore lineage-specific genes and generate hypotheses for the function of unknown genes.

https://doi.org/10.7554/eLife.67667.sa0

eLife digest
It is estimated that scientists do not know what half of microbial genes actually do. When these genes are discovered in microorganisms grown in the lab or found in environmental samples, it is not possible to identify what their roles are. Many of these genes are excluded from further analyses for these reasons, meaning that the study of microbial genes tends to be limited to genes that have already been described.

These limitations hinder research into microbiology, because information from newly discovered genes cannot be integrated to better understand how these organisms work. Experiments to understand what role these genes have in the microorganisms are labor-intensive, so new analytical strategies are needed.

To do this, Vanni et al. developed a new framework to categorize genes with unknown roles, and a computational workflow to integrate them into traditional analyses. When this approach was applied to over 400 million microbial genes (both with known and unknown roles), it showed that the share of genes with unknown functions is only about 30 per cent, smaller than previously thought. The analysis also showed that these genes are very diverse, revealing a huge space for future research and potential applications. Combining their approach with experimental data, Vanni et al. were able to identify a gene with a previously unknown purpose that could be involved in antibiotic resistance.

This system could be useful for other scientists studying microorganisms to get a more complete view of microbial systems. In future, it may also be used to analyze the genetics of other organisms, such as plants and animals.

FREE PDF GRATIS: eLIFE