计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (23): 104-113.DOI: 10.3778/j.issn.1002-8331.2304-0280

• 模式识别与人工智能 • 上一篇    下一篇

KIRC组学数据分类的自注意亚型识别神经网络

李阳,陈锡程,伍亚舟   

  1. 陆军军医大学 军事预防医学系 军队卫生统计学教研室,重庆 400038
  • 出版日期:2023-12-01 发布日期:2023-12-01

Self-Attention Subtype Recognition Neural Network for Classification of Kidney Renal Clear Cell Carcinoma Data

LI Yang, CHEN Xicheng, WU Yazhou   

  1. Department of Military Medical Statistics, Department of Military Preventive Medicine, Army Medical University, Chongqing 400038, China
  • Online:2023-12-01 Published:2023-12-01

摘要: 为分析肾透明细胞癌(KIRC)的转录组学数据,利用自注意力机制构建改良分类模型。构建了一种新的自注意力亚型识别神经网络(SSRNN),其包含了编码器和分类器部分,以自注意力机制为主要改良方式。在筛出358个与生存相关的蛋白编码基因后,利用聚类分析确定了三种亚型最为适宜。对于C1、C2和C3三组癌症亚型进行了临床信息比较和生存分析比较,揭示了各组在生存结局上的差异。SSRNN取得了最优异的分类性能,取得了93.44%的曲线下面积。基因表达热图提示三种亚型的基因表达存在差异,推测基因的低表达指示较好的生存预后。对三种亚型两两间进行差异分析并绘制火山图,共可获取266个差异基因。GO和KEGG富集分析及节点图的绘制则有利于揭示癌症相关的功能和通路。因此,SSRNN具有较高的预测精度和稳健性,可有效地利用组学数据进行KIRC的生存预测,并筛选合理的生物学标志物,具有较高的方法学意义和应用价值。

关键词: 自注意力, 深度学习, 组学数据, 肾透明细胞癌, 亚型识别

Abstract: To analyse transcriptomic data of kidney renal clear cell carcinoma(KIRC), a modified classification model is constructed using the self-attention mechanism. A new self-attention subtype recognition neural network(SSRNN) is constructed, which includes an encoder and classifier and takes the self-attention mechanism as the main improvement. After screening 358 survival-related protein-coding genes, three subtypes are identified by cluster analysis. Comparison of clinical information and survival analysis for the three cancer subtypes of C1, C2 and C3 reveales differences in survival outcomes among the groups. SSRNN achieves the best classification performance, achieving an area under the curve of 93.44%. Gene expression heatmaps reveale differences in gene expression among the three subtypes, suggesting that low gene expression indicates better survival prognosis. By analysing the differences among the three subtypes and drawing the volcano map, 266 differentially expressed genes are obtained. GO and KEGG enrichment analyses and node mapping are helpful to reveal cancer-related functions and pathways. Therefore, the SSRNN has high prediction accuracy and robustness, can effectively use omics data to predict the survival of KIRC patients and screen reasonable biomarkers, and has high methodological significance and application value.

Key words: self-attention, deep learning, omics data, kidney renal clear cell carcinoma, subtype recognition