Computer Engineering and Applications ›› 2021, Vol. 57 ›› Issue (23): 106-112.DOI: 10.3778/j.issn.1002-8331.2011-0215

• Big Data and Cloud Computing • Previous Articles     Next Articles

Over-Sampling Method on Imbalanced Data Based on WKMeans and SMOTE

CHEN Junfeng, ZHENG Zhongtuan   

  1. School of Mathematics, Physics and Statistics, Shanghai University of Engineering Science, Shanghai 201620, China
  • Online:2021-12-01 Published:2021-12-02



  1. 上海工程技术大学 数理与统计学院,上海 201620


A new method for imbalanced data sets on feature weighting and clustering ensembles is proposed(WKMeans-SMOTE), which aims to solve the problem of synthesizing all the minority samples without any guidance in SMOTE method. Firstly, considering the different degree of impact of different feature weights on the clustering results, a new clustering algorithm with different feature weights is selected. The initial cluster center is changed many times to generate different clustering results.Then,clustering results are aligned based on the idea of matching clusters algorithm,and the cluster boundary minority samples are picked by introducing clustering consistency index. Finally, the SMOTE method is used on those picked minority samples, and CART algorithm is used as the base classifier to train the balanced dataset.The experimental results show that the method achieves better classifying quality on F-value and G-mean compared with SMOTE, Borderline-SMOTE, ADASYN and other oversampling methods.

Key words: imbalanced data classification, clustering ensembles, feature weighting, clustering consistency index, clusters matching, over-sampling



关键词: 不平衡数据分类, 聚类融合, 特征权重, 聚类一致性系数, 簇匹配, 过采样