Computer Engineering and Applications ›› 2009, Vol. 45 ›› Issue (29): 118-119.DOI: 10.3778/j.issn.1002-8331.2009.29.035

• 数据库、信号与信息处理 • Previous Articles     Next Articles

Gene identification using multiple statistical features

MA Bao-shan1,ZHU Yi-sheng1,CHEN Yu-zhen2   

  1. 1.College of Information Science and Technology,Dalian Maritime University,Dalian,Liaoning 116026,China
    2.College of Electromechanics and Materials Engineering,Dalian Maritime University,Dalian,Liaoning 116026,China
  • Received:2008-11-12 Revised:2009-01-19 Online:2009-10-11 Published:2009-10-11
  • Contact: MA Bao-shan

用多种统计特征识别基因序列

马宝山1,朱义胜1,陈玉珍2   

  1. 1.大连海事大学 信息科学技术学院,辽宁 大连 116026
    2.大连海事大学 机电与材料工程学院,辽宁 大连 116026
  • 通讯作者: 马宝山

Abstract: Gene identification based on statistical features is satisfactory for the long gene sequences,but is without impressive success for the short gene sequences.Base composition,period-3 behavior,codon usage and base location relation are studied respectively,and then the gene identification algorithm based on multiple features is proposed.The experimental results indicate that the average accuracy of the developed algorithm is 2.2% higher than that of the existing approach for the gene sequences whose lengths are less than 90 bp(base pair).

Key words: bioinformatics, gene identification, statistical feature

摘要: 基于统计特征的基因识别算法对较长的序列预测精度较高,但对于较短的基因序列识别精度仍然不理想。在分别研究基因序列的碱基组成成分、周期3性质、密码子使用频率和碱基位置相关性的基础上,提出了一种基于多种特征的基因识别算法。实验结果表明对于长度小于90 bp(base pair)的基因序列,提出算法的平均预测精度比现有算法提高2.2%。

关键词: 生物信息学, 基因识别, 统计特征

CLC Number: