Computer Engineering and Applications ›› 2019, Vol. 55 ›› Issue (10): 169-178.DOI: 10.3778/j.issn.1002-8331.1804-0054

Previous Articles     Next Articles

DP Clustering, Creditability Weighted Fuzzy Support Vector Machine

SHENG Xiaoxia1, YANG Zhimin2, WANG Tiantian1   

  1. 1.College of Science, Zhejiang University of Technology, Hangzhou 310023, China
    2.Zhijiang College, Zhejiang University of Technology, Hangzhou 310024, China
  • Online:2019-05-15 Published:2019-05-13

DP聚类的可信性加权模糊支持向量机

盛晓遐1,杨志民2,王甜甜1   

  1. 1.浙江工业大学 理学院,杭州 310023
    2.浙江工业大学 之江学院,杭州 310024

Abstract: Considering that SVM(Support Vector Machine) has relatively low classification performance in the case of outliers and unbalanced data, a weighted fuzzy support vector machine was proposed. And the fuzzy membership in that paper is not a good measure for the contribution of the sample to the determination of the optimal separating hyperplane. Thus, a DP(Density Peaks)clustering, creditability weighted fuzzy support vector machine is proposed. Outliers are found by DP clustering, then the outliers are eliminated. The distance from every sample to the hyperplane determined by DEC(Different Error Costs)is used to bulid the initial degree of membership. Then the degree of membership is updated with the improved FSVM-CIL(Fuzzy Support Vector Machines for Class Imbalance Learning). Finally, some samples are removed, which reduces the number of samples and reduces the impact of data imbalances. The effectiveness of the proposed algorithm is verified by experiments.

Key words: outliers, unbalanced data, Density Peaks(DP), weighted fuzzy support vector machine, fuzzy membership;creditability

摘要: 由于SVM(Support Vector Machine)在有离群点和不平衡数据的问题中分类性能相对较低,有研究者提出了一种面向不均衡分类的隶属度加权模糊支持向量机,只是文中的模糊隶属度并不能较好衡量样本点对确定最佳分划超平面所做的贡献大小。针对以上问题提出了密度峰(Density Peaks,DP)聚类的可信性加权模糊支持向量机。首先由DP聚类找到离群点后剔除。再根据点到由DEC(Different Error Costs)确定的超平面的距离,得到初始隶属度,并用改进的FSVM-CIL(Fuzzy Support Vector Machines for Class Imbalance Learning)更新隶属度。之后剔除部分样本点,起到简约样本的作用,并减少数据不平衡带来的影响。通过实验验证了所提出算法的有效性。

关键词: 离群点, 不平衡数据, 密度峰(DP), 加权模糊支持向量机, 模糊隶属度, 可信性