计算机工程与应用 ›› 2009, Vol. 45 ›› Issue (29): 118-119.DOI: 10.3778/j.issn.1002-8331.2009.29.035

• 数据库、信号与信息处理 • 上一篇    下一篇

用多种统计特征识别基因序列

马宝山1,朱义胜1,陈玉珍2   

  1. 1.大连海事大学 信息科学技术学院,辽宁 大连 116026
    2.大连海事大学 机电与材料工程学院,辽宁 大连 116026
  • 收稿日期:2008-11-12 修回日期:2009-01-19 出版日期:2009-10-11 发布日期:2009-10-11
  • 通讯作者: 马宝山

Gene identification using multiple statistical features

MA Bao-shan1,ZHU Yi-sheng1,CHEN Yu-zhen2   

  1. 1.College of Information Science and Technology,Dalian Maritime University,Dalian,Liaoning 116026,China
    2.College of Electromechanics and Materials Engineering,Dalian Maritime University,Dalian,Liaoning 116026,China
  • Received:2008-11-12 Revised:2009-01-19 Online:2009-10-11 Published:2009-10-11
  • Contact: MA Bao-shan

摘要: 基于统计特征的基因识别算法对较长的序列预测精度较高,但对于较短的基因序列识别精度仍然不理想。在分别研究基因序列的碱基组成成分、周期3性质、密码子使用频率和碱基位置相关性的基础上,提出了一种基于多种特征的基因识别算法。实验结果表明对于长度小于90 bp(base pair)的基因序列,提出算法的平均预测精度比现有算法提高2.2%。

关键词: 生物信息学, 基因识别, 统计特征

Abstract: Gene identification based on statistical features is satisfactory for the long gene sequences,but is without impressive success for the short gene sequences.Base composition,period-3 behavior,codon usage and base location relation are studied respectively,and then the gene identification algorithm based on multiple features is proposed.The experimental results indicate that the average accuracy of the developed algorithm is 2.2% higher than that of the existing approach for the gene sequences whose lengths are less than 90 bp(base pair).

Key words: bioinformatics, gene identification, statistical feature

中图分类号: