The first Korean genome sequence and analysis: full genome sequencing for a socio-ethnic group
Sung-Min Ahn1,5*, Tae-Hyung Kim2*, Sunghoon Lee2*, Deokhoon Kim1, Ho Ghang2,
Dae-Soo Kim2, Byoung-Chul Kim2, Sang-Yoon Kim2, Woo-Yeon Kim2, Chulhong Kim2,
Daeui Park2, Yong Seok Lee2, Sangsoo Kim3, Rohit Reja2, Sungwoong Jho2, Chang Geun
Kim6, Ji-Young Cha1, Kyung-Hee Kim4, Bonghee Lee1, Jong Bhak2§, and Seong-Jin Kim1§
1Lee Gil Ya Cancer and Diabetes Institute, Gachon University of Medicine and Science, Incheon, Korea
2Korean BioInformation Center (KOBIC), KRIBB, Daejeon, Korea
3Department of Bioinformatics & Life Science, Soongsil University, Seoul, Korea
4Department of Laboratory Medicine, Gachon University Gil Hospital, Incheon, Korea
5Department of Translational Medicine, Gachon University Gil Hospital, Incheon, Korea
6National Center for Standard Reference Data, Korea Research Institute of Standards and Science, Daejeon, Korea
*These authors contributed equally to this work.
§Corresponding authors
Email:jongbhak@yahoo.com and jasonsjkim@gachon.ac.kr
Tel: 82-42-879-8500 Fax: 82-42-879-8519
Abstract
We present the first Korean individual genome sequence (SJK) and analysis results. The diploid genome of a Korean male was sequenced to 28.95-fold redundancy using the
Illumina paired-end sequencing method. SJK covered 99.9% of the NCBI human reference
genome. We identified 420,083 novel SNPs that are not in the dbSNP database.
Despite a close similarity, significant differences were observed between the Chinese genome (YH),the only other Asian genome available, and SJK: 1) 39.87% (1,371,239 out of 3,439,107)SNPs were SJK-specific (49.51% against Venter’s, 46.94% against Watson’s, and 44.17% against the Yoruba genomes), 2) 99.5% (22,495 out of 22,605) of short indels (< 4 bp)discovered on the same loci had the same size and type as YH, and 3) 11.3% (331 out of 2920) deletion structural variants were SJK-specific. Even after attempting to map unmapped reads of SJK to unanchored NCBI scaffolds, HGSV, and available personal genomes, there were still 5.77% SJK reads that could not be mapped. All these findings indicate that the overall genetic differences among individuals from closely related ethnic groups may be significant. Hence, constructing reference genomes for minor socio-ethnic groups will be useful for massive individual genome sequencing.
+++++
PDF gratuito do artigo aqui.