计算机工程与应用 ›› 2012, Vol. 48 ›› Issue (28): 36-41.

• 博士论坛 • 上一篇    下一篇

利用计算方法识别定义内含子保留的基因组特征

马  猛1,汪  洋2   

  1. 1.安徽大学 计算机科学技术学院,合肥 230032
    2.美国北卡罗来纳大学 教堂山分校 药理学系,美国,教堂山 27599-7365
  • 出版日期:2012-10-01 发布日期:2012-09-29

Using computational methods to identify genomic features that define intron retention

MA Meng1, WANG Yang2   

  1. 1.School of Computer Science and Technology, Anhui University, Hefei 230032, China
    2.Department of Pharmacology, University of North Carolina at Chapel Hill, American, Chapel, Hill, 27599-7365
  • Online:2012-10-01 Published:2012-09-29

摘要: 超过90%的人基因都存在选择性剪接。正确识别不同的选择性剪接模式对深刻理解基因剪接调控机制具有重要意义。内含子保留是一种较为常见的基因剪接模式。从基因组序列本身出发, 利用计算方法识别定义内含子保留的各种基因组特征,抽取统计意义上显著差异的特征,利用三种分类方法(SVM,NN,NB)对这些基因组特征进行了分类预测验证实验,保留内含子的平均预测精度达到70%,整体平均精度达到90%,取得了良好效果。研究方法也可应用于其他基因剪接模式的研究。

关键词: 基因剪接, 内含子保留, 基因组特征

Abstract: Over 90% percent of human genes show alternative splicing. Identifying correctly different alternative splicing events will be meaningful to understand the gene splicing regulatory mechanism. Intron retention is considerably common gene splicing event. In this paper, based on the genomic sequences, using various computational methods to identify the genomic features that define intron retention and extracting the statistically different features, three kinds of classifying methods are used to test the identifying power of these genomic features. The experiment result shows that the average precision for retained introns is up to 70% and the average total precision reaches 90% and proves the good identifying power of the genomic features extracted. The studying methods of this paper can be applied to other gene splicing events.

Key words: gene splicing, intron retention, genomic features