计算机工程与应用 ›› 2016, Vol. 52 ›› Issue (15): 60-64.

• 理论与研发 • 上一篇    下一篇

三元统计语言模型对基因表达载体设计的优化

方  刚1,张社民2   

  1. 1.西安文理学院 生物与环境工程学院,西安 710065
    2.西安财经学院 信息学院,西安 710100
  • 出版日期:2016-08-01 发布日期:2016-08-12

3-gram statistical language model optimization to expression vector design

FANG Gang1, ZHANG Shemin2   

  1. 1.School of Biological and Environmental Engineering, Xi’an University, Xi’an 710065, China
    2.School of Information, Xi’an University of Finance and Economics, Xi’an 710100, China
  • Online:2016-08-01 Published:2016-08-12

摘要: 不断地将一些合成生物学标准“零件”以一定的标准装配,就可以得到由数十个功能片段组成的复杂表达载体。但是每一类的合成生物学标准“零件”数量众多,随着这些标准“零件”的不断开发,其数量也在进一步增加。在进行表达载体构建的最后阶段,从众多的“零件”中选择合适的以组装成功能性表达载体费时费力,并且容易发生错误。为解决这一问题,采用了自然语言处理的统计语言模型,并以该模型为基础应用动态规划算法优化表达载体设计,从众多的选项中找出最优者。利用这一方法可以减少进行生物学实验的冗余操作,从而减少表达载体构建过程中的花费。

关键词: 合成生物学, 统计语言模型, 动态规划算法, 合成生物学标准&ldquo, 零件&rdquo

Abstract: By assembling BioBrick according to a standard, complex genetic constructs composed of dozens of functional blocks can be made. But usually every category of BioBrick contains a few or many parts. With the increasing of BioBrick amount, the process of assembling more than a few of sets of BioBrick can be costly, time consuming and error prone. At the last step of assembling it is quite difficult to make decision which part should be selected. Based on statistical language model, a dynamic programming algorithm is carried out to solve the problem. The algorithm optimizes the results of a genetic design based on a grammatical model and figures out an optimal solution. In this way, redundant operations can be reduced and the time and cost required for conducting biological experiment can be minimized.

Key words: synthetic biology, statistical language model, dynamic programming, BioBrick