计算机工程与应用 ›› 2022, Vol. 58 ›› Issue (1): 274-281.DOI: 10.3778/j.issn.1002-8331.2007-0331

• 工程与应用 • 上一篇    下一篇

遗传算法优化的BP神经网络拷贝数变异检测

黄体浩,李俊青,赵海勇   

  1. 聊城大学 计算机学院,山东 聊城 252000
  • 出版日期:2022-01-01 发布日期:2022-01-06

Copy Number Variation Detection of BP Neural Network Based on Genetic Algorithm

HUANG Tihao, LI Junqing, ZHAO Haiyong   

  1. School of Computer Science and Technology, Liaocheng University, Liaocheng, Shandong 252000, China
  • Online:2022-01-01 Published:2022-01-06

摘要: 拷贝数变异是一种主要的基因组结构变异形式,会导致基因组区域中出现大小不等的扩增或缺失。针对现有拷贝数变异检测算法受GC含量偏差、测序误差等因素影响而导致检测能力低的问题,提出了一种基于遗传算法优化的BP神经网络拷贝数变异检测算法。该算法充分考虑基因组相邻位置之间的内在相关性,融合多个特征,并使用BP神经网络解决各个特征之间的联合作用以预测CNV;针对现有的BP神经网络模型存在的问题,利用遗传算法优化BP神经网络的权值和阈值,以提高该算法的CNV检测性能。实验结果表明,该算法对不同测序覆盖深度和肿瘤纯度共300个样本的平均检测灵敏度、平均检测精度和平均[F1]评分分别为97.27%、97.78%和97.53%,均优于其他几种算法,且能够显著降低样本边界偏差值。

关键词: 拷贝数变异, BP神经网络, 遗传算法, 读取深度

Abstract: Copy number variation(CNV) is a major form of genome structural variation, which will lead to amplification or deletion of different sizes in the genome region. Aiming at the problem that the existing copy number variation detection algorithm is affected by factors such as GC-content bias and sequencing errors, resulting in low detection ability. A copy number variation detection algorithm based on genetic algorithm optimized BP neural network is proposed. The algorithm fully considers the inherent correlation between adjacent positions of the genome, fuse multiple features and training a neural network to solve the joint effect of each feature to predict CNV. In view of the existing problems of the BP neural network model, the genetic algorithm is used to optimize the weights and thresholds of the BP neural network to improve the CNV prediction performance of the algorithm. Experimental results show that the proposed algorithm has an average detection sensitivity, average detection accuracy and average [F1] score of 300 samples with different sequencing coverage and tumor purity of 97.27%, 97.78% and 97.53%, which are superior to other algorithms, and can significantly reduce the sample boundary bias value.

Key words: copy number variation(CNV), BP neural network, genetic algorithm, read depth