Computer Engineering and Applications ›› 2019, Vol. 55 ›› Issue (12): 132-139.DOI: 10.3778/j.issn.1002-8331.1811-0059

Previous Articles     Next Articles

Research on HP Model Optimization Method Based on Reinforcement Learning

WU Hongjie1,2, YANG Ru1, FU Qiming1, CHEN Jianping1, LU Weizhong1   

  1. 1.School of Electronics and Information Engineering, Suzhou University of Science and Technology, Suzhou, Jiangsu 215009, China
    2.Jiangsu Provincial Key Lab for Information Processing Technologies, Soochow University, Suzhou, Jiangsu 215006, China
  • Online:2019-06-15 Published:2019-06-13

基于强化学习的HP模型优化方法研究

吴宏杰1,2,杨  茹1,傅启明1,陈建平1,陆卫忠1   

  1. 1.苏州科技大学 电子与信息工程学院,江苏 苏州 215009
    2.苏州大学 江苏省计算机信息处理技术重点实验室,江苏 苏州 215006

Abstract: Protein structure prediction is an important factor in the area of bioinformatics. Predicting the two-dimensional structure of proteins based on the Hydrophobic Polarity model(HP model) is a typical Non-deterministic Polynomial(NP)-hard problem. Currently, HP model optimization methods include the greedy algorithm, particle swarm optimization, genetic algorithm, ant colony algorithm and the Monte-Carlo simulation method. However, the robustness of these methods are not sufficient, and it is easy to fall into a local optimum. Therefore, a HP model optimization method, based on reinforcement learning is proposed. In the full state space, a reward function based on energy function is designed and a rigid overlap detection rule is introduced. By using the characteristics of the continuous Markov optimal decision and maximizing global cumulative return, the global evolutionary relationship in biological sequences is fully exploited, and effective and stable predictions are retrieved. Eight classical sequences from publications and Uniref50 are selected as experimental objects. The robustness, convergence and running time are compared with the greedy algorithm and particle swarm optimization algorithm, respectively. Both reinforcement method and swarm optimization method can find all the lowest energy structures for these eight sequences, while the greedy algorithm only detects 62.5%. Compared with particle swarm optimization, the running time of the reinforcement method is 63.9% lower than that of particle swarm optimization.

Key words: reinforcement learning, Hydrophobic Polarity(HP) model, structure prediction

摘要: 蛋白质结构预测问题一直是生物信息学中的重要问题。基于疏水极性模型的蛋白质二维结构预测问题是一个典型的NP难问题。目前疏水极性模型优化的方法有贪心算法、粒子群算法、遗传算法、蚁群算法和蒙特卡罗模拟方法等,但这些方法成功收敛的鲁棒性不高,容易陷入局部最优。由此提出一种基于强化学习的HP模型优化方法,利用其连续马尔可夫最优决策与最大化全局累计回报的特点,在全状态空间中,构建基于能量函数的奖赏函数,引入刚性重叠检测规则,充分挖掘生物序列中的全局进化关系,从而进行有效与稳定的预测。以3条经典论文序列和5条Uniref50序列为实验对象,与贪心算法和粒子群算法分别进行了鲁棒性、收敛性与运行时间的比较。贪心算法只能在62.5%的序列上进行收敛,该文方法能在5万次训练后稳定的在所有序列上达到了收敛。与粒子群算法相比,两者都能找到最低能量结构,但该文的运行时间较粒子群算法降低了63.9%。

关键词: 强化学习, HP模型, 结构预测