Gene name errors are widespread in the scientific literature
Mark Ziemann, Yotam Eren and Assam El-Osta Email author
DOI: 10.1186/s13059-016-1044-7 © The Author(s). 2016
Published: 23 August 2016
The spreadsheet software Microsoft Excel, when used with default settings, is known to convert gene names to dates and floating-point numbers. A programmatic scan of leading genomics journals reveals that approximately one-fifth of papers with supplementary Excel gene lists contain erroneous gene name conversions.
Microsoft Excel Gene symbol Supplementary data
GEO: Gene Expression Omnibus
JIF: journal impact factor
We thank A. Kaspi and H. Rafehi for discussions on this paper, and R. Lazarus for informatics support.
AEO is supported by the National Health and Medical Research Council (NHMRC GNT0526681, GNT1048377); Juvenile Diabetes Research Foundation (JDRF 5-2008-298, 27-2012-451); Diabetes Australia Research Trust (DART); Victorian Government’s Operational Infrastructure Support program (in part).
Availability of data and materials
Bash scripts, URLs and output data supporting the conclusions of this article are available in the SourceForge repository
MZ, YE and AEO designed and conducted analyses and co-wrote the manuscript. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Ethics approval and consent to participate
No ethical approval was required.
This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Additional file 1: Table S1. List of supplementary files containing Excel gene name errors from journals and Gene Expression Omnibus (GEO). (XLSX 81 kb)
FREE PDF GRATIS: Genome Biology