计算机工程与应用 ›› 2021, Vol. 57 ›› Issue (14): 95-102.DOI: 10.3778/j.issn.1002-8331.2006-0318

• 理论与研发 • 上一篇    下一篇

二次幂耦合的[K]-means聚类算法研究

相益萱,姜合,潘品臣,孙聪慧   

  1. 齐鲁工业大学(山东省科学院) 计算机科学与技术学院,济南 250353
  • 出版日期:2021-07-15 发布日期:2021-07-14

Study on [K]-means Clustering Algorithm of Quadratic Power Coupling

XIANG Yixuan, JIANG He, PAN Pinchen, SUN Conghui   

  1. School of Computer Science and Technology, Qilu University of Technology(Shandong Academy of Sciences), Jinan 250353, China
  • Online:2021-07-15 Published:2021-07-14

摘要:

在聚类研究中,通常认为数据集的对象、属性等方面是满足独立同分布的,它们之间是互不影响的,然而实际上它们之间存在着某些潜在的联系,即非独立同分布。为了更好地挖掘其存在的潜在关系,将数据集进行二次幂处理,计算皮尔森相关系数后得到二次幂耦合的数据集样本,为了解决[K]-means聚类算法存在选取初始中心点的敏感性问题,基于密度的思想,通过计算密度参数合理调整高密度区域,利用聚类迭代的方法进行选点,将高密度区域中的密度最大点作为初始点,距离初始点最远点作为第二个点,以前两个点为中心聚类迭代得到两个质心,将距离两个质心最远的点作为第三点,以此类推,实验结果表明所给的算法能够得到较高的准确率,较少的迭代次数,以及相对较好的聚类效果。

关键词: 非独立同分布, 二次幂耦合, 皮尔森相关系数, 聚类迭代, [K]-means聚类算法

Abstract:

In clustering research, it is generally believed that the objects, attributes and other aspects of data sets are independent and identically distributed, and they do not affect each other. However, in fact, there are some potential relations between them, namely, Non-IID. In order to better mine the potential relationship, the data set is processed by the second power, and the data set samples coupled by the second power are obtained after calculating Pearson correlation coefficient. In order to solve the sensitivity problem of [K]-means clustering algorithm in selecting the initial center point, based on the idea of density, the high-density region is reasonably adjusted by calculating the density parameters, The clustering iteration method is used to select the points. The maximum density point in the high-density region is taken as the initial point, the farthest point from the initial point is taken as the second point, and the previous two points are taken as the center. Two centroids are obtained by clustering iteration, and the farthest point from the two centroids is taken as the third point, By analogy, the results show that it can get higher accuracy, fewer iterations, and relatively good clustering effect.

Key words: non-IID(Independent and Identically Distributed), quadratic power coupling, Pearson correlation coefficient, clustering iteration;[K]-means clustering algorithm