Computer Engineering and Applications ›› 2007, Vol. 43 ›› Issue (35): 57-60.

• 学术探讨 • Previous Articles     Next Articles

Research on parameterized algorithm of Haplotypes assembly MEC problem

XIE Min-zhu1,2,WANG Jian-xin2,CHEN Jian-er2   

  1. 1.College of Physics and Information Science,Hunan Normal University,Changsha 410081,China
    2.School of Information Science and Engineering,Central South University,Changsha 410083,China
  • Received:1900-01-01 Revised:1900-01-01 Online:2007-12-11 Published:2007-12-11
  • Contact: XIE Min-zhu

单体型组装MEC问题的参数化算法研究

谢民主1,2,王建新2,陈建二2   

  1. 1.湖南师范大学 物理与信息科学学院,长沙 410081
    2.中南大学 信息科学与工程学院,长沙 410083
  • 通讯作者: 谢民主

Abstract: The haplotype assembly MEC problem is the computational problem of inducing a pair of haplotypes from an individual’s DNA fragments sequencing data by correcting minimum SNPs.Based on the characters of DNA fragments,the paper introduces a parameterized algorithm of time complexity O(nk22k2+mlogm+mk1) with m fragments,n SNPs,the maximum number of SNP sites that a fragment covers k1(usually smaller than 10) and the maximum number of the fragments covering a SNP site k2(usually no more than 10).For the practical fragment data,the algorithm can solve the MEC problem efficiently even if m and n are larger and it is scalable and applicable in practice.

Key words: bioinformatics, haplotyping, parameterized algorithm, SNPs(Single-Nucleotide Polymorphisms)

摘要: 单体型组装MEC问题指如何利用个体的DNA测序片断数据,翻转最少的SNP位点值以确定该个体单体型的计算问题。根据片段数据的特点提出了一个时间复杂度为 O(nk22k2+mlogm+mk1)的参数化算法,其中m为片段数,n为单体型的SNP位点数,k1为一个片断覆盖的最大SNP位点数(通常小于10),k2为覆盖同一SNP位点的片段的最大数(通常不大于10)。对于实际DNA测序中的片段数据,即使mn都相当大,该算法也可以在较短的时间得到MEC问题的精确解,具有良好的可扩展性和较高的实用价值。

关键词: 生物信息学, 单体型检测, 参数化算法, 单核苷酸多态性