计算机工程与应用 ›› 2018, Vol. 54 ›› Issue (8): 265-270.DOI: 10.3778/j.issn.1002-8331.1611-0035

• 工程与应用 • 上一篇    

结合中心约束改进聚类算法的社区发现技术

夏洋洋,刘  渊,黄亚东   

  1. 江南大学 数字媒体学院,江苏 无锡 214122
  • 出版日期:2018-04-15 发布日期:2018-05-02

Community discovery based on improved clustering algorithm with central constraints

XIA Yangyang, LIU Yuan, HUANG Yadong   

  1. School of Digital Media, Jiangnan University, Wuxi, Jiangsu 214122, China
  • Online:2018-04-15 Published:2018-05-02

摘要: 进行社区发现时,首先从某一节点开始进行随机行走,计算两个节点之间的对称社会距离,并用此距离来分析两个用户节点之间的相关性。社交网络中存在着关系不均匀的现象,有些个体之间关系非常稠密,而有些却异常稀疏,由此构成的虚拟社区需要用特定的社区发现技术进行挖掘。前人提出过利用可能性C均值聚类算法(PCM)和处理好的社会距离进行社区发现,但通过虚拟社区算法评价的准确度指标发现,对于数据量大,数据粘性强的数据,其聚类效果并不理想。而聚类中心的好坏直接决定着聚类性能的好与坏,因此利用类中心约束方法对PCM算法进行改进,得到的新型聚类算法更加适用于真实网络数据集。实验针对真实数据集,利用准确度指标进行了验证。

关键词: 对称社会距离, 随机行走, 可能性C均值算法, 准确度指标

Abstract: In the process of community discovery, it firstly starts a random walk from a node, calculates the symmetrical social distance between two nodes, and uses this distance to analyze the correlation between two user nodes. In the social network, there is a phenomenon of non-uniformity. Some individuals are very dense, while others are very sparse. Therefore, the virtual community needs to be excavated with specific community discovery technology. However, through the accuracy index of virtual community algorithm evaluation, it is found that for the data with large data volume and strong data stickiness, the clustering algorithm of poly-clustering algorithm(PCM) class effect is not ideal. The PCM algorithm is improved with central constraints, the new clustering algorithm is more suitable for the existence of some data missing or there is a large number of noise, the exception point of the real network data set. Experiments are carried out to verify the accuracy of the real data set.

Key words: symmertrical social distance, random walk, Possibilistic C-Means(PCM) algorithm, accuracy of indicators