Computer Engineering and Applications ›› 2022, Vol. 58 ›› Issue (18): 154-161.DOI: 10.3778/j.issn.1002-8331.2102-0146

• Pattern Recognition and Artificial Intelligence • Previous Articles     Next Articles

Multi-Omics Data Deep Autoencoder Integration for Cancer Subtyping

CAO Yewei, LIU Fei   

  1. School of Software, Sorth China University of Technology, Guangzhou 510006, China
  • Online:2022-09-15 Published:2022-09-15

癌症多组学数据深度自编码器整合分型方法

曹业伟,刘飞   

  1. 华南理工大学 软件学院,广州 510006

Abstract: In cancer research, high-throughput sequencing techniques have yielded a large amount of complex heterogeneous data. Although several deep learning or statistical methods have been applied to integrate such data, there is a lack of work on how to integrate multi-omics data more efficiently. Therefore, a deep learning-based multi-omics data integration method named deep autoencoder for multi-omics integration(DAEMI) is proposed. It utilizes a deep learning algorithm named autoencoder, which uses a bottleneck layer in the network to get a compressed knowledge representation of the original input. Unlike previous deep learning integration studies, this method finds more subtypes with significant survival differences. DAEMI does not rely on survival data for compressed features selection. It uses all features obtained from the bottleneck layer and uses [K]-means to identify cancer subtypes. By comparing with high-order path elucidated similarity(HOPES), Similarity network fusion(SNF), iClusterPlus, and moCluster on 4 cancer datasets and simulate dataset, DAEMI performs better than other methods. Functional analysis revealed neurodegenerative diseases and mitochondrial dysfunction may share some pathways with cancer.

Key words: multi-omics data integration, cancer subtyping, [K]-means, deep learning, survival analysis

摘要: 在癌症研究中,随着高通量测序技术发展已经产生了海量的复杂数据。尽管有了一些利用深度学习和统计学方法进行多组学数据整合的研究,但目前仍缺乏较为有效率的整合方法。因此提出一种基于深度自编码器的多组学数据整合方法(deep autoencoder for multi-omics integration,DAEMI)。它利用自编码器中的瓶颈层,学习多组学数据的特征表示。与先前利用深度学习整合的研究相比,DAEMI可以发现明显生存差异的癌症亚型。同时因为不需要生存数据来选择特征,DAEMI可以使用更多特征进行[K]均值聚类,进而完成癌症分型任务。将DAEMI应用于模拟数据集与四个癌症数据集实验,通过与高阶路径相似度网络的融合模型(HOPES)、相似性网络融合(SNF)、iClusterPlus和moCluster进行比较,结合模拟数据集测试结果与真实癌症数据集测试结果来看,DAEMI要优于其他方法。相应的生物功能分析揭示,神经退行性疾病与线粒体功能障碍可能与癌症共享某些生物学通路。

关键词: 多组学数据整合, 癌症分型, [K]均值, 深度学习, 生存分析