%0 Journal Article %A WU Zhengjiang %A YANG Tian %A ZHENG Ailing %A MEI Qiuyu %A ZHANG Yaning %T Study on Set-Valued Data Balancing Method by Semi-Monolayer Covering Rough Set %D 2022 %R 10.3778/j.issn.1002-8331.2112-0079 %J Computer Engineering and Applications %P 166-173 %V 58 %N 19 %X Nowadays, imbalanced data exist in all areas of life, and how to effectively classify it has become a hot topic of studies. Traditional methods of over-sampling and under-sampling ensure balanced data, but cannot overcome the effects on the classification of the data due to data distribution and noise. To reduce the influence of data distribution and noise on the classification of imbalanced data in set-valued information systems, a new method combining oversampling and under-sampling based on semi-monolayer covering rough set is proposed. The data are divided into two main parts by applying semi-monolayer covering rough set [DA0] and [DE0] lower approximation, the part be-longing to the lower approximation set is oversampled by BorderlineSMOTE, the part not belonging to the lower approximation set is under-sampled by ClusterCentroids, and finally, the two are combined to the final data set. Semi-monolayer covering rough set is a high approximation quality, a fast computational model which suitable for set-valued information systems. The high approximation quality allows it to retain as much reliable data as possible to ensure the generalization capability of the model. The hybrid approach not only reduces the impact of noisy data on BorderlineSMOTE but also preserves the information integrity of the filtered-out data to a great extent through ClusterCentroids. Finally, the effectiveness of the model is verified through relevant comparative experiments using ExtraTree, DecisionTree and FGCNN. %U http://cea.ceaj.org/EN/10.3778/j.issn.1002-8331.2112-0079