计算机工程与应用 ›› 2020, Vol. 56 ›› Issue (10): 171-178.DOI: 10.3778/j.issn.1002-8331.1910-0273

• 模式识别与人工智能 • 上一篇    下一篇

混合粒子群和改进细菌觅食的不平衡数据分类

黄建琼,郭文龙   

  1. 1.福州外语外贸学院 理工学院,福州 350202
    2.福建江夏学院 电子信息科学学院,福州 350108
  • 出版日期:2020-05-15 发布日期:2020-05-13

Hybrid Particle Swarm Optimization and Improved Bacterial Foraging Optimization to Classify Imbalanced Data

HUANG Jianqiong, GUO Wenlong   

  1. 1.School of Technology, Fuzhou University of International Studies and Trade, Fuzhou 350202, China
    2.College?of Electronics?and?Information?Science, Fujian?Jiangxia?University, Fuzhou 350108, China
  • Online:2020-05-15 Published:2020-05-13

摘要:

针对细菌觅食优化(Bacterial Foraging Optimization,BFO)算法易陷入局部最优的缺点,提出了混合粒子群优化(Particle Swarm Optimization,PSO)算法与改进的细菌觅食优化(Improved BFO)算法应用于不平衡数据的分类。使用三个数据集测试所提算法的性能,其一是卵巢癌微阵列真实数据,另两个来自UCI数据库的垃圾电子邮件数据最优集和动物园数据集。采用边界合成少数过采样技术(Borderline-SMOTE)和Tomek Link对不平衡数据进行预处理,利用所提算法对不平衡数据进行分类。在改进细菌觅食优化算法的过程中,对趋化过程进行改进,采用粒子群优化算法先进行搜索,将粒子作为细菌进行处理,提高了细菌觅食优化的全局搜索能力。改进复制操作过程,提高优胜劣汰的选择标准。改进迁徙操作过程,防止种群陷入局部最优,防止进化停滞。仿真结果表明,所提算法分类准确度优于现有方法。

关键词: 不平衡数据, 改进的细菌觅食优化, 粒子群优化

Abstract:

To overcome the shortcoming of Bacterial Foraging Optimization(BFO) algorithm easily falling into a local optimum, this paper proposes hybrid Particle Swarm Optimization(PSO) and Improved Bacterial Foraging Optimization(IBFO) algorithm to classify imbalanced data. Three datasets are used for testing the performance of the proposed algorithm. One is the real ovarian cancer microarray data, and the other two from the UCI repository are spam email dataset and zoo dataset. The Borderline Synthetic Minority Oversampling Technique(Borderline-SMOTE) and Tomek Link are used to pre-process imbalanced data. Thereafter, the proposed algorithm is used to classify imbalanced data. In the improvement of the bacterial foraging optimization algorithm, firstly, the chemotaxis process is improved. The particle swarm optimization algorithm is used to search first and then treats the result as bacteria, which improves the global searching ability of the bacterial foraging optimization. Secondly, the reproduction operation process is improved, and the selection standard of the survival of the fittest is improved. Finally, the elimination and dispersal operation process is improved, so as to prevent the population from falling into local optimum and preventing evolution stagnation. The simulated results reveal that the classification accuracy of the proposed algorithm is better than the existing approaches.

Key words: imbalanced data, improved bacterial foraging optimization, particle swarm optimization