Computer Engineering and Applications ›› 2021, Vol. 57 ›› Issue (4): 61-67.DOI: 10.3778/j.issn.1002-8331.2002-0268

Previous Articles     Next Articles

New Differential Evolution with Particle Swarm Optimization Algorithm for Text Clustering

HU Xiaomin, WANG Mingfeng, ZHANG Shourong, LI Min   

  1. 1.School of Computers, Guangdong University of Technology, Guangzhou 510006, China
    2.School of Information Engineering, Guangdong University of Technology, Guangzhou 510006, China
  • Online:2021-02-15 Published:2021-02-06

用于文本聚类的新型差分进化粒子群算法

胡晓敏,王明丰,张首荣,李敏   

  1. 1.广东工业大学 计算机学院,广州 510006
    2.广东工业大学 信息工程学院,广州 510006

Abstract:

In the process of text clustering with high dimension and sparse features, Particle Swarm Optimization(PSO) algorithm easily falls into the local optimization in the later stage with the increase of algorithm iterations. A Differential Evolution(DE) strategy with better diversity is added to update the population and try to find a better global optimal solution. Meanwhile, considering the influence of the randomness of the centroids order among individuals on learning and updating individuals, a method of the self-adaptive adjustment of the centroids order is proposed, by which the centroid with the maximum similarity between individuals will be listed in the same cluster index as much as possible. Finally, through the test on the text datasets, the advantages of the proposed clustering Index adaptive DEPSO(IDEPSO) algorithm are verified, compared with other existing algorithms in internal and external indicators, and the effectiveness and feasibility of the algorithm are proved.

Key words: text clustering, high dimension, Particle Swarm Optimization(PSO), Differential Evolution(DE), [K]-means

摘要:

针对粒子群优化(Particle Swarm Optimization,PSO)算法在维度高、特征稀疏的文本聚类过程中,随着算法迭代次数增加在后期陷入局部最优的问题,提出采用多样性更好的差分进化(Differential Evolution,DE)策略更新种群,尝试找到更好的全局最优解。考虑到种群个体间包含的聚类中心向量排列顺序的随机性对个体间的学习与更新的影响,提出一种自适应调整聚类中心向量排列顺序的方法,将个体间相似度最大的聚类中心向量尽可能排列在同一维度。通过在文本数据集上进行测试,验证了所提出的聚类中心排列调整差分进化粒子群(Index adaptive DEPSO,IDEPSO)算法在内部、外部指标上相对于其他现有算法的优势,证明了该算法的有效性和可行性。

关键词: 文本聚类, 高维度, 粒子群优化(PSO), 差分进化(DE), [K]-均值