计算机工程与应用 ›› 2020, Vol. 56 ›› Issue (9): 65-74.DOI: 10.3778/j.issn.1002-8331.1911-0347

• 理论与研发 • 上一篇    下一篇

可能性聚类假设的半监督分类方法

但雨芳,陶剑文,徐浩特   

  1. 1.宁波职业技术学院 电子信息工程学院,浙江 宁波 315800
    2.宁波大学 信息科学与工程学院,浙江 宁波 315211
    3.江西理工大学 信息工程学院,江西 赣州 341000
  • 出版日期:2020-05-01 发布日期:2020-04-29

Semi-Supervised Classification Method of Possibilistic Clustering Assumption

DAN Yufang, TAO Jianwen, XU Haote   

  1. 1.School of Electronics and Information Engineering, Ningbo Polytechnic, Ningbo, Zhejiang 315800, China
    2.Faculty of Electrical Engineering and Computer Science, Ningbo University, Ningbo, Zhejiang 315211, China
    3.School of Information Engineering, Jiangxi University of Science and Technology, Ganzhou, Jiangxi 341000, China
  • Online:2020-05-01 Published:2020-04-29

摘要:

在机器学习领域,基于图的半监督学习凭借其直观性和良好的学习性能而吸引了越来越多的关注。 针对现有的基于图的半监督学习方法对噪声和异常数据的鲁棒性不够好/较敏感的问题,提出一种基于可能性聚类假设的半监督分类方法(Semi-Supervised Classification Method of Possibilistic Clustering Assumption,SSPCA),其约束每个数据点与其局部加权均值具有相同的标签隶属度值,以此来提高分类方法的可靠性,此外,在目标函数中引入一个关于模糊熵的正则项,通过增大样本判别信息量来增强隶属度函数的泛化能力,提高了该方法对噪声和异常数据的鲁棒性。在实际数据集上的大量实验结果证实了所提方法具有很好的分类可靠性和鲁棒性。

关键词: 可能性聚类, 半监督分类, 隶属度函数, 局部加权均值, 模糊熵

Abstract:

Graph based Semi-Supervised Learning(GSSL) has been attracting more and more attention with its intuitiveness and good learning performance in the machine learning community. The existing GSSL method constructs undirected weight graph, which is sensitive or not enough robust to noise or abnormal data. To handle the problem, a Semi-Supervised Classification Method of Possibilistic Clustering Assumption(SSPCA) is proposed. Then, for improving the reliability of the classification method, the SSPCA constrains each instance to have the same label membership value as its local weighted mean. A regularization term about fuzzy entropy that is added into the objective function. It aims to increase the data discriminated information. After that, the resulting label membership function has more generalization and the robust of the proposed method is improved on the noise and abnormal data. Some real world problems implementation on the basis of the proposed SSPCA show their reliability and robustness classification.

Key words: probability clustering, semi-supervised classification, membership function, local weighted mean, fuzzy entropy