计算机工程与应用 ›› 2021, Vol. 57 ›› Issue (1): 188-193.DOI: 10.3778/j.issn.1002-8331.2004-0016

• 模式识别与人工智能 • 上一篇    下一篇

改进灰狼优化算法的K-Means文本聚类

潘成胜,张斌,吕亚娜,杜秀丽,邱少明   

  1. 大连大学 通信与网络重点实验室,辽宁 大连 116622
  • 出版日期:2021-01-01 发布日期:2020-12-31

K-Means Text Clustering Based on Improved Gray Wolf Optimization Algorithm

PAN Chengsheng, ZHANG Bin, LYU Yana, DU Xiuli, QIU Shaoming   

  1. Key Laboratory of Communication and Network, Dalian University, Dalian, Liaoning 116622, China
  • Online:2021-01-01 Published:2020-12-31

摘要:

针对K-Means算法在文本聚类过程中易陷入局部最优,造成文本聚类结果不准确的问题,提出了一种基于改进灰狼优化算法的K-Means文本聚类方法。在对文本数据进行分词、去停用词、特征提取以及文本向量化后,通过免疫克隆选择选出精英个体,并对精英个体进行深度探索以增加灰狼种群的多样性,避免早熟收敛现象的发生;将粒子群位置更新思想与灰狼位置更新结合,降低灰狼优化算法陷入局部极值的风险;与K-Means算法结合进行文本聚类。所提算法与K-Means算法、GWO-KMeans以及IPSK-Means算法相比,其准确率、召回率和F值平均都有明显提高,文本聚类结果更可靠。

关键词: K-Means算法, 文本聚类, 灰狼优化算法, 免疫克隆, 粒子群

Abstract:

Focusing the issue of K-Means algorithm is easy to fall into the local optimum during the text clustering process, which results in inaccurate text clustering results. The K-Means text clustering method based on the improved gray wolf optimization algorithm is proposed. After word segmentation, de-stopping, feature extraction, and text vectorization of text data, the elite individuals are selected through immune cloning, and the elite individuals are explored in depth to increase the diversity of the gray wolf population and avoid premature convergence. It combines the particle swarm location update idea with the gray wolf location update to reduce the risk of the gray wolf optimization algorithm falling into local extremes. Finally, improved gray wolf optimization algorithm is combined with the K-Means algorithm for text clustering. Compared with the K-Means algorithm, GWO-KMeans and IPSK-Means algorithm, the proposed algorithm has significantly improved accuracy, recall and F-value average, respectively, the text clustering result is more reliable.

Key words: K-Means algorithm, text clustering, gray wolf optimization, immune clone, particle swarm