Computer Engineering and Applications ›› 2021, Vol. 57 ›› Issue (19): 282-289.DOI: 10.3778/j.issn.1002-8331.2006-0231

Previous Articles     Next Articles

Detection Algorithm?of Pathogenic Microbes from Next-Generation Sequencing Data

LI Jie, LI Miao, YUAN Xiguo   

  1. 1.College of Commerce, Xi’an Fanyi University, Xi’an 710105, China
    2.School of Computer Science and Technology, Xidian University, Xi’an 710071, China
  • Online:2021-10-01 Published:2021-09-29

面向新一代测序数据的病原微生物检测算法

李杰,李苗,袁细国   

  1. 1.西安翻译学院 商学院,西安 710105
    2.西安电子科技大学 计算机科学与技术学院,西安 710071

Abstract:

Pathogenic microbe is one of the major factors to lead the spread of cross infection and even major infectious diseases. Accurate detection of pathogenic microbes poses a significant and valuable influence on the precise diagnosis and treatment of cross or infectious diseases. Traditional methods usually adopt cultivation strategy to observe and discriminate pathogenic microbes. However, such methods are difficult to satisfy the requirement of accurate and complete detection of pathogenic microbes in modern precision medicine, due to the limited categories of pathogenic microbes to be cultivated. At present, new detection methods based on RNA molecular level have been developed and paid close attention in the community. The key issue is how to use the DNA sequencing data of the tested samples, and use statistical calculation or machine learning methods to identify which pathogenic microbes are contained in the samples. This paper aims at the region of 16S rDNA with the background of next-generation sequencing data, and establishes a precise detection algorithm of pathogenic microbes based on Naive Bayesian model. The major principle of the proposed method lies in: The 16S rDNA sequence is firstly aligned to the references of pathogenic microbes, and then three features are extracted from the alignment state, with which a classification model based on Naive Bayes is established to determine whether each microbe in the pathogenic microbe library exists in the tested samples, so as to achieve the accurate detection of pathogenic microbes. Finally, the proposed algorithm is tested through simulation experiments, and compared with the several peer algorithms, and the results indicate the advantages of the proposed algorithm.

Key words: next-generation sequencing data, detection of pathogenic microbes, feature extraction, Naive Bayesian model

摘要:

病原微生物是导致交叉感染疾病甚至重大传染性疾病传播的重要因素之一,准确检测病原微生物对于感染或传染疾病的有效防御和精准诊疗具有十分重要的意义和价值。传统检测方法往往是采用培养手段进行观察和鉴别,但由于可培养的微生物种类有限,难以满足现代精准医疗中对病原微生物的准确且完备的检测要求。基于DNA分子水平上的病原微生物新型检测手段在当前得到发展和密切关注,其核心问题是如何利用被检样本的DNA测序数据,运用统计计算或机器学习方法判别样本中含有哪些病原微生物。以新一代测序数据为背景,以16S rDNA序列为分析对象,建立一种基于朴素贝叶斯的病原微生物精准检测算法,其核心思想在于:将16S rDNA序列的测序读段与病原微生物参考基因组序列进行比对,依据比对状态提取三种特征,以此构建基于朴素贝叶斯的分类模型,判别病原微生物库中每种微生物在被检样本中是否存在,从而达到病原微生物的精准检测。最后,通过仿真实验验证了所提算法的有效性,并与国际同行算法做了比较,表明该算法的优势。

关键词: 新一代测序技术, 病原微生物检测, 特征提取, 朴素贝叶斯