基于多目标的三支决策边界域求解与分类方法

doi:10.3778/j.issn.1002-8331.2312-0230

摘要/Abstract

摘要： 三支决策将不确定样本划分至边界域进行延迟决策，但需基于损失函数确定阈值，以划分边界域，然而，损失函数通常需要先验知识，具有一定的主观性，因此对边界域划分能力不足。针对这种问题，构建一种多目标三支决策边界域求解方法，从而更好地划分边界域，提升分类性能。采用贝叶斯规则获取样本的条件概率；设定3个目标，包括降低边界域的不确定性、缩小边界域的大小以及减小整个决策区域的错误分类率，通过融入熵权法的TOPSIS（technique for order preference by similarity to an ideal solution）方法求取最优阈值，该方法采用熵权法计算这3个目标所占的权重，得到最优阈值，获得边界域，进行延迟决策；结合不同分类器对边界域进行分类。通过UCI数据集进行对比实验，根据分类准确率和F1值，表明该方法学习到的阈值能合理地划分边界域，建立的模型能取得更好的分类性能。

关键词: 分类不确定性, 三支决策, 边界域, 多目标, 最优阈值

Abstract: The three-way decision divides the uncertain samples into the boundary domain for delayed decision, but it needs to determine the threshold based on the loss function to divide the boundary domain. However, the loss function usually needs prior knowledge and has a certain degree of subjective, so the ability to divide the boundary domain is insufficient. First, Bayes’ rule is used to obtain the conditional probability of the sample; then, three primary goals are established, including minimizing the uncertainty of the boundary domain, constraining the size of the boundary domain, and reducing the misclassification rate of the whole decision domain. It finds the optimal threshold by incorporating the entropy weight method into TOPSIS (technique for order preference by similarity to an ideal solution) method. This method uses the entropy weight method to calculate the weights of these three goals, obtain the optimal threshold, obtain the boundary domain, and make delayed decisions; finally, a combination of different classifiers is employed to classify the boundary domain. Comparative experiments are carried out on UCI data sets. According to the classification accuracy and F1 value, it shows that the threshold learned by this method can reasonably divide the boundary domain, and the established model can achieve better classification performance.

Key words: classification uncertainty, three-way decisions, boundary domain, multi-objective, optimal threshold

聂斌, 靳海科, 杜建强, 张玉超, 郑学鹏, 陈星鑫, 苗震. 基于多目标的三支决策边界域求解与分类方法[J]. 计算机工程与应用, 2024, 60(19): 97-109.

NIE Bin, JIN Haike, DU Jianqiang, ZHANG Yuchao, ZHENG Xuepeng, CHEN Xingxin, MIAO Zhen. Boundary Domain Solving and Classification Method of Three-Way Decisions Based on Multi-Objective[J]. Computer Engineering and Applications, 2024, 60(19): 97-109.

参考文献

[1] HE X, PENG Y, XIE L. A new benchmark and approach for fine-grained cross-media retrieval[C]//Proceedings of the 27th ACM International Conference on Multimedia, 2019: 1740-1748.
[2] 胡清华, 王煜, 周玉灿, 等. 大规模分类任务的分层学习方法综述[J]. 中国科学: 信息科学, 2018, 48(5): 487-500.
HU Q H, WANG Y, ZHOU Y C, et al. Review on hierarchical learning methods for large-scale classification task[J]. Science in China(Information Sciences), 2018, 48(5): 487-500.
[3] 刘晗. 不确定数据聚类分类研究[D]. 大连: 大连理工大学, 2018.
LIU H. Research on uncertain data clustering and classification[D]. Dalian: Dalian University of Technology, 2018.
[4] DEMPSTER A P. Upper and lower probabilities induced by a multivalued mapping[M]//Classic works of the Dempster-Shafer theory of belief functions. Berlin, Heidelberg: Springer, 2008: 57-72.
[5] SHAFER G. A mathematical theory of evidence[M]. Princeton: Princeton University Press, 1976:1-60.
[6] YAO Y Y. Three-way decisions with probabilistic rough sets[J]. Information Sciences, 2010, 180(3): 341-353.
[7] 张清华, 支学超, 王国胤, 等. 基于属性代表的多粒度集成分类算法[J]. 计算机学报, 2022, 45(8): 1712-1729.
ZHANG Q H, ZHI X C, WANG G Y, et al. Multi-granularity ensemble classification algorithm based on attribute representation[J]. Chinese Journal of Computers, 2022, 45(8): 1712-1729.
[8] YAO Y Y. The superiority of three-way decisions in probabilistic rough set models[J]. Information Sciences, 2011, 181(6): 1080-1096.
[9] 王琴, 刘盾. 结合集成学习的序贯三支情感分类方法研究[J]. 计算机工程与应用, 2021, 57(23): 211-218.
WANG Q, LIU D. Sequential three-way sentiment classification combined with ensemble learning[J]. Computer Engineering and Applications, 2021, 57(23): 211-218.
[10] 李娴, 张泽华, 赵霞, 等. TWD-GNN: 基于三支决策的图神经网络推荐方法[J]. 计算机工程与应用, 2020, 56(12): 156-162.
LI X, ZHANG Z H, ZHAO X, et al. TWD-GNN: recommendation method of graph neural network based on three-way decision[J].Computer Engineering and Applications, 2020, 56(12): 156-162.
[11] 卓永泰, 董又铭, 高灿. 基于邻域互信息的三支特征选择[J]. 计算机工程与应用, 2022, 58(22): 159-164.
ZHUO Y T, DONG Y M, GAO C. Three-way feature selection based on neighborhood mutual information[J].Computer Engineering and Applications, 2022, 58(22): 159-164.
[12] YAO Y, DENG X. Sequential three-way decisions with probabilistic rough sets[C]//Proceedings of the IEEE 10th International Conference on Cognitive Informatics and Cognitive Computing (ICCI-CC’11), 2011: 120-125.
[13] LI Y, ZHANG L, XU Y, et al. Enhancing binary classification by modeling uncertain boundary in three-way decisions[J]. IEEE Transactions on Knowledge and Data Engineering, 2017, 29(7): 1438-1451.
[14] LI W, HUANG Z, JIA X. Two-phase classification based on three-way decisions[C]//Proceedings of the 8th International Conference on Rough Sets and Knowledge Technology, Halifax, NS, Canada, Oct 11-14, 2013. Berlin, Heidelberg: Springer, 2013: 338-345.
[15] 贾修一, 李伟湋, 商琳, 等. 一种自适应求三枝决策中决策阈值的算法[J]. 电子学报, 2011, 39(11): 2520-2525.
JIA X Y, LI W W, SHANG L, et al. An adaptive learning parameters algorithm in three-way decision-theoretic rough set model[J]. Acta Electronica Sinica, 2011, 39(11): 2520-2525.
[16] 徐久成, 徐战威, 李梦凡, 等. 基于三支决策的二阶段分类模型研究[J]. 河南师范大学学报(自然科学版), 2019,47(3): 28-34.
XU J C, XU Z W, LI M F, et al. Research on two-stage classification model based on three-way decisions[J].Journal of Henan Normal University(Natural Science Edition), 2019, 47(3): 28-34.
[17] 徐健锋, 苗夺谦, 张远健.基于混淆矩阵的多目标优化三支决策模型[J]. 模式识别与人工智能, 2017, 30(9): 859-864.
XU J F, MIAO D Q, ZHANG Y J. Three-way decisions model for multi-object optimization based on confusion matrix[J]. Pattern Recognition and Artificial Intelligence,2017, 30(9): 859-864.
[18] 张清华, 黄志康, 高满, 等. 基于不确定性与错误分类率博弈的序贯三支决策模型[J]. 电子学报, 2022, 50(5): 1033-1041.
ZHANG Q H, HUANG Z K, GAO M, et al. Sequential three-way decision model based on the game between uncertainty and error classification rate[J]. Acta Electronica Sinica, 2022, 50(5): 1033-1041.
[19] TRIANTAPHYLLOU E, TRIANTAPHYLLOU E. Multi-criteria decision making methods[M]. New York: Springer US, 2000:31-36.
[20] HWANG C L, YOON K, HWANG C L, et al. Methods for multiple attribute decision making[M]//Multiple attribute decision making. Berlin, Heidelberg: Springer-Verlag, 1981: 58-191.
[21] 李航. 统计学习方法[M]. 北京: 清华大学出版社, 2019: 65-66.
LI H. Statistical learning methods[M]. Beijing: Tsinghua University Press, 2019: 65-66.
[22] YAO Y. Three-way decision: an interpretation of rules in rough set theory[C]//Proceedings of the 4th International Conference on Rough Sets and Knowledge Technology, Gold Coast, Australia, Jul 14-16, 2009. Berlin, Heidelberg: Springer, 2009: 642-649.
[23] LIANG J, CHIN K S, DANG C, et al. A new method for measuring uncertainty and fuzziness in rough set theory[J]. International Journal of General Systems, 2002, 31(4): 331-342.
[24] DENG X, YAO Y. An information-theoretic interpretation of thresholds in probabilistic rough sets[C]//Proceedings of the International Conference on Rough Sets and Knowledge Technology. Berlin, Heidelberg: Springer, 2012: 369-378.