Computer Engineering and Applications ›› 2020, Vol. 56 ›› Issue (1): 165-171.DOI: 10.3778/j.issn.1002-8331.1809-0107

Previous Articles     Next Articles

PSO_BFA Optimized Bag of Words Model and Prediction of Protein Subcellular Localization

HU Xuejiao, CHEN Xingjian, ZHAO Nan, XUE Wei   

  1. School of Information Science and Technology, Nanjing Agricultural University, Nanjing 210095, China
  • Online:2020-01-01 Published:2020-01-02

PSO_BFA优化词袋模型及蛋白质亚细胞定位预测

胡雪娇,陈行健,赵南,薛卫   

  1. 南京农业大学 信息科学技术学院,南京 210095

Abstract: A bag of words model based on PSO_BFA optimization is proposed. The traditional bag of words model has two important parameters, the window size [d] and the dictionary size [k], respectively. By combining particle swarm optimization and bacterial foraging algorithm, a new integrated optimization algorithm called PSO_BFA is proposed. During the process of local search in PSO, the replication and migration behavior of BFA are added to obtain the best solution of the new PSO_BFA, which is the best combination of window size and dictionary size.Then the optimized BOW model combined with amino acid composition and pseudo amino acid composition is applied to extract the feature vectors of the protein sequences. The experimental results show that the BOW model optimized by PSO_BFA can effectively improve the accuracy of protein subcellular location prediction.

Key words: bag of words model, particle swarm optimization, bacterial foraging, subcellular localization prediction

摘要: 提出了一种基于PSO_BFA优化的词袋模型。传统词袋模型有两个重要参数:窗口大小[d]和字典大小[k]。结合粒子群算法和细菌觅食算法产生新的PSO_BFA混合优化算法,在PSO进行局部搜索时,加入BFA的复制和迁移行为,得到PSO_BFA的最优解即为窗口大小和字典大小的最佳组合。将优化词袋模型与蛋白质序列的氨基酸组成算法和伪氨基酸组成算法结合,获得蛋白质序列的词袋特征。实验结果证明,基于PSO_BFA优化的词袋模型能有效提高蛋白质亚细胞定位预测的精度。

关键词: 词袋模型, 粒子群算法, 细菌觅食, 亚细胞定位预测