Korean Genome Project: 1094 Korean personal genomes with clinical information
Science Advances  27 May 2020
Sungwon Jeon 他

On the basis of our analysis, Korean population is genetically homogeneous compared to other East Asians, and this is probably due to geopolitical isolation in the past thousands of years.

DISCUSSIONの冒頭で上のように明言している。主成分分析図でも、遺伝的均質性が非常に高い日本人に比べてすらきつい固まり状態である。

恐らくは、韓国人は、人口1千万以上に限定すれば、確実に世界一遺伝的な均質性の高い集団である。

methodsの記述から見て、決定的なPCA図であり、朝鮮史と合わせて考えれば、異様なほど遺伝的均質性の高い集団であると考えてよい。通常、日本人とフィンランド人が人口急増集団であることから極めて遺伝的に均質なな集団と見なされているが、キチガイどもは、この全ゲノムシーケンス論文で遥かに遺伝的に均質な集団であったことが判明した。discussion中で、中国人・日本人よりも均質性が高いと明言していることからも間違いない。

①異常に高い遺伝的な均質性

小規模集団では、高い遺伝的均質性は、全体として知能の低下をきたすことは経験則として良く知られている

韓国は2018年に調べた結果、全60以上もの科学関係の国際的な賞の受賞者が全くいない

②異常に高い遺伝的な均質性

人類は、経験則として、異常に高い遺伝的な均質性=俗な言い方で、孤立した村で「血が濃くなってしまう」と奇形児等肉体的にはさほど大きな影響は出ないが、精神面では、成長後「変な子」が多くなることを知っている

韓国人どもは、非同義変異が他集団に比べ多いこと、キチガイどもに固有の変異が一部の者に異常に集中して多い。上と同じことである

全ての謎が解けたような気がする


*補足資料は、別のURLでpdf形式でまとまっている、下は、補足資料からの韓国人どもだけの図
完全な円形であり、近交系数はかなり高いであろう。日本人対象で1万以上のサンプルでは楕円形であるPDF補足資料

2021-11-19 (1)
*図S4が大規模サンプルでの韓国人どもの変異分布の異様性をはっきりと示している

<2020-10-09 (3)


これから、日本人との分のみ取り出した
2022-02-05 (2) 

Abstract

Also, Korea1K, as a reference, showed better imputation accuracy for Koreans than the 1KGP panel.

korea1Kという新しいのができたそうだ。これぐらいしか意味のある内容はない

INTRODUCTION

In East Asia, the 1KJPN project yielded data on 1070 Japanese genomes (17), and another recent dataset identified selection signatures in the Japanese population from 2234 Japanese whole-genome data (18)

Deep whole-genome sequencing reveals recent selection signatures linked to evolution and disease risk of Japanese.

Here, we introduce a dataset comprising 1094 Korean whole genomes of which 1007 genomes were newly generated in combination with systematically acquired clinical and biochemical measurement information from the blood and urine of the participants.

RESULTS

SNVs and indels in Korea1K dataset

stupid10

Whole-genome sequencing (WGS) data from 1007 blood or saliva samples (984 samples with clinical and biochemical information) were generated with an average sequencing depth of 31× and pooled with sequencing data from an additional 87 blood or saliva samples (without clinical information) from the KoVariome database

In total, 1094 complete genomes, including 916 unrelated and healthy individuals, mostly from the Ulsan metropolitan region, were compared to the human genome reference


サンプル1007名は、大部分が韓国慶尚道の蔚山広域市で採取。これと87名の既公表のKoVariome databaseからのデータ合算して計1094名

サイエンスと同じ協会が発行するオープンアクセスの総合科学誌(science advances)掲載であるが、論文掲載料が日本円で1ドル100円換算でも50万以上するのには驚いた。science advances掲載は、acceptedと表示されていないので論文査読はないと思う。いずれにしても???。

それ以上に、シーケンスデータ公表していないことも疑問である。KoVariomeは公表されている、初の大規模サンプルであり、コストも恐らくは1億近くかかっているはずである、普通であれば、必ずバイオバンクジャパンのようにデータ公表するはずである。結果として、現状ではキチガイどもの全ゲノムシーケンスデータで公表されているのはFDA論文が分析した100名程度のデータしかない。
意図的に公表していない可能性すら否定しえない


We divided the variants into five categories based on their allele frequency in the Korean population (singleton: allele count = 1; doubleton: allele count = 2; rare: allele count of >2 and allele frequency of ≤0.01; common: allele frequency of >0.01 but ≤0.05; and very common: allele frequency of >0.05; Fig. 1A).

5分類
①singleton 1
②doubleton 2
③2を超え、かつ、0.01=1%以下が rare
④1%<common 5%以下
⑤very common 5%超

Highlighting the power of our large dataset, approximately half of the variants that we identified were classified as singleton or doubleton (allele count of ≤2), and unexpectedly, more than 70% of them are not reported in dbSNP (v150) (20).

singleton or doubletonの70%以上がnovelだとのこと。当たり前だ

On the other hand, less than 20% of the variants were classified as very common (allele frequency of >0.05), with more than 94% of these variants previously reported in dbSNP (v150)
新しく発見されたのは6%

On the basis of the final set of variants, each individual showed on average ~4.42 M variants (3.58 M very common, 0.4 M common, 0.31 M rare, 0.46 M doubleton, and 0.85 M singleton variants), of which 8928 and 918 were nonsynonymous and loss of function (LoF), respectively.
上の記述が、第一段階のまとめであり、サンプル1名平均で、約440万のSNV、うち8928の非同義変異と918の欠失変異と記載している




2020-10-09 (3)

Next, we classified each variant into 1 of 19 different variant classes (i.e., intergenic and intronic) based on its functional impact and location in the genome (fig. S5).
2020-10-09



LoF variants (nonsense, nonstop, splicing site, and indel variants) in the Korea1K set had a higher ratio of rare, doubleton, or singleton variants than other regional classes, indicating the effect of purifying selection on these variants. In addition, the allele (site) frequency spectrum of unrelated individuals was used to estimate the fraction under selection pressure in different genomic regions (21). We confirmed that LoF variants had the highest fraction of sites under negative selection (fig. S6).


We applied the same comparative analysis to the entire gene set and found that 16 genes showed high purifying selection pressure, which was even stronger than the selection for nonsynonymous variants across the genome.
(この部分重要)

The Korea1K set contained 266,081 nonsynonymous SNVs.Among them, 118,417 and 117,414 were categorized as protein damaging by PolyPhen (24) (possibly damaging, 46,116; probably damaging, 72,301) and SIFT (25) (deleterious, 117,414), respectively.

In total, 87,671 variants were predicted as protein damaging by both programs, and their allele frequency is skewed toward rare frequen-cies, while benign or tolerated variants are skewed toward common frequencies, again indicating purifying selection (fig. S11).
ここまでの記述が、第二段階とでも言うべきもの
2020-10-09 (1)


When mitochondrial and chromosomal Y haplogroups among the Korean individuals (figs. S12 and S13) were investigated,
一応、ハプログループについても分析しているが、新たな内容な全くない

Genomic features of Koreans compared to other populations

We assessed the genetic distinctiveness of our Korea1K sample using principal components analysis (PCA) with the small size variants (SNP and indel) in our dataset and 1KGP.

PCA1
(日本人と韓国人どもが、かなり離れてプロットされているが、主成分分析図は、日本人の2重構造説を支持すると言っていい)

2020-10-09 (2)


we found that those three populations clustered distinctly from each other (Fig. 2B). This pattern was replicated by ADMIXTURE analysis with K = 3 (fig. S14).

To investigate functionally relevant variants, we extracted 1048 ClinVar pathogenic variants found in Korea1K. Among them, 242 variants had an allele frequency greater than 0.1 in Korea1K, which is high for pathogenic variants (fig. S15).

この記述は重要、太字赤字は私が付けた
We also found 35 drug-response variants annotated in ClinVar (fig. S16), and 11 of them displayed significantly different allele frequencies from those of the Chinese or Japanese individuals in the 1KGP set, highlighting the importance of population-specific datasets when interpreting pathogenic or drug-response variants.

For example, the variant rs4961 in ADD1 had the highest frequency in the Korea1K compared to other populations and is associated with hypertension and responsiveness to furosemide and spironolactone as shown in a European study (30, 31).

高血圧ではなく、キチガイどものオツムに影響が出ている。解明されるのは、かなり先だが・・・。

TE insertions with significantly different allele frequencies between Koreans and 26 other populations in the 1KGP set were enumerated, and as expected, Korea1K displayed significantly fewer differential TE insertions compared to East Asian populations than non–East Asians (Fig. 2, C and Dand extended data table S2). Furthermore, ALU and SINE-VNTR-ALUs (SVA) displayed a greater proportion of differential TE insertions than Long interspersed nuclear element (LINE) in JPT, CHB, and CHS, probably because of different insertion rates on the TE types.

 
HLA types A*24:02, A*26:01, A*31:01, B*40:02, and B*52:01 displayed significantly lower allele frequencies in the Korean population relative to the Japanese population (Fisher’s exact test P = 3.61 × 10−49, 7.09 × 10−8, 1.34 × 10−12, 9.61 × 10−12, and 3.13 × 10−42, respectively), while types A*33:03 and B*44:03 had higher allele frequencies (Fisher’s exact test P = 3.10 × 10−46 and 1.00 × 10−5, respectively). Although the Japanese are genetically very close to the Korean, the HLA-type profiles of these populations are considerably different. 

GWAS based on clinical traits

意味のある記述無し

Korea1K imputation panel





The Korea1K dataset as a panel of normals for cancer genomics studies

興味ないので読まず




DISCUSSION

On the basis of our analysis, Korean population is genetically homogeneous compared to other East Asians, and this is probably due to geopolitical isolation in the past thousands of years.

Method

PCA and ADMIXTURE with the 1KGP genome data

The interpopulation genomic structure was evaluated by projecting the first two PCs determined via PCA of SNVs from Korea1K samples and 1KGP without closely related individuals. We selected and merged variants and from the Korea1K and 1KGP sets in accordance with the following criteria:

  • 1)

    Biallelic SNVs with a MAF of ≥5%.

  • 2)

    Biallelic SNVs with an HWE P >10−6.

  • 3)

    Biallelic SNVs with a missing genotype rate of <0.01.

Extracted variants were LD pruned using “--indep 50 5 2” in PLINK (), yielding 153,633 sites. PCA was carried out using the EIGENSOFT program (). ADMIXTURE () analysis was performed from K = 2 to K = 14 based on the same variants set as PCA. We plotted an ADMIXTRUE plot for K = 3, which showed the smallest cross-validation error rate across the Ks.