Computer Engineering and Applications ›› 2012, Vol. 48 ›› Issue (33): 188-193.

Previous Articles     Next Articles

Semi-supervised fuzzy C-means algorithm with maximum center distance

YAO Ziyang   

  1. School of Internet of Things, Wuxi Professional College of Science and Technology, Wuxi, Jiangsu 214028, China
  • Online:2012-11-21 Published:2012-11-20

半监督中心最大化模糊C均值算法

姚紫阳   

  1. 无锡科技职业学院 物联网技术学院,江苏 无锡 214028

Abstract: In the field of pattern recognition, data analysis methods generally fall into supervised learning methods and unsupervised learning methods. However, these two kinds of methods are not suitable for practical application. Usually the data obtained from production is neither informationless nor all-information-given. In addition, the data obtained usually contains some noises for too much interference factors in practical production, and these noises are of great influence on traditional analysis methods, especially on clustering methods. To solve the two problems as mentioned before, based on the classical FCM this paper introduces a new algorithm with the compensation term for membership of unsupervised clustering algorithm and centralized to maximize distance which reduces the impact of interference points. The new algorithm is called semi-supervised Fuzzy C-Means algorithm with maximize center distance, SCM-FCM for short. The experimental results on UCI data sets show that the algorithm of this paper has a better performance than the traditional unsupervised clustering analysis method.

Key words: semi-supervised, maximum center, Fuzzy C-Means(FCM), noise immunity

摘要: 在模式识别领域内,对于数据的分析方法一般分为:有监督的学习方法及无监督的学习方法。而这两类方法均与实际应用不符,一般生产所获之数据既不可能毫无信息可知又不可能全部信息已知。此外,由于实际生产的干扰因素过多导致所获之数据样本信息通常包含一些干扰信息,这些数据对传统的分析方法影响较大,其中尤以聚类方法最为敏感。针对以上两大问题,以经典的无监督聚类算法FCM算法为基础,通过引入半监督性质的隶属度补偿项以及减弱干扰点影响的中心最大化项构造出了新的聚类算法称之为半监督中心最大化模糊C均值算法,简称SCM-FCM。通过在UCI数据集上的仿真实验结果表明该算法较之于传统的无监督聚类分析方法有着更好的应用价值。

关键词: 半监督, 中心最大化, 模糊C均值算法(FCM), 抗干扰性