Computer Engineering and Applications ›› 2015, Vol. 51 ›› Issue (2): 146-150.

Previous Articles     Next Articles

Research on tags co-occurrence for tags clustering algorithm

WANG Yadan, LI Peng, JIN Yu, LIU Yu   

  1. 1.College of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan 430065, China
    2.Hubei Province Key Laboratory of Intelligent Information Processing and Real-time Industrial System, Wuhan 430065, China
  • Online:2015-01-15 Published:2015-01-12

标签共现的标签聚类算法研究

王娅丹,李  鹏,金  瑜,刘  宇   

  1. 1.武汉科技大学 计算机科学与技术学院,武汉 430065
    2.智能信息处理与实时工业系统湖北省重点实验室,武汉 430065

Abstract: In the social network, tag clustering analysis can deal with problems such as tag redundancy and semantic fuzziness and so on. In order to improve the effectiveness of clustering, it proposes to integrate label co-occurrence information and derive the feature vector of label, extracts the feature vector to calculate the similarity. The traditional clustering algorithm uses the geometric distance to calculate the distance to the object and the center of the object, now uses the Pearson correlation coefficient to calculate. The tag clustering algorithm that combines with K-means clustering algorithm to cluster label is proposed, and then analyzes the complexity of the algorithm. Finally, doing relevant comparative experiments for different clustering algorithms, the experimental results show that the proposed clustering algorithm enhances the clustering performance than other clustering algorithms, and verify the availability and effectiveness of the proposed clustering algorithm.

Key words: tag clustering, tag co-occurrence, K-means, Pearson?correlation coefficient, feature vector

摘要: 在社会网络中,标签聚类研究可以解决标签冗余和语义模糊等问题。为了提高聚类有效性,提出综合标签共现信息确定标签特征向量,通过特征向量的提取计算相似度,将传统聚类算法中用几何距离计算对象与中心对象的距离改为用皮尔森相关系数计算,提出结合K-means聚类算法对标签进行聚类的标签共现聚类算法,并分析了算法的复杂度。最后对不同聚类算法进行了相关对比实验,实验结果表明该聚类算法效果要好于其他的聚类算法,从而验证了该聚类算法的有效性和可行性。

关键词: 标签聚类, 标签共现, K-means, 皮尔森系数, 特征向量