计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (15): 167-177.DOI: 10.3778/j.issn.1002-8331.2412-0388

• 理论与研发 • 上一篇    下一篇

KNN特征增强与互信息特征选择的两阶段多维分类方法

李二超,张宝新,贾彬彬   

  1. 兰州理工大学 电气工程与信息工程学院,兰州 730050
  • 出版日期:2025-08-01 发布日期:2025-07-31

Two-Stage Multi-Dimensional Classification Method Combining KNN Feature Enhancement and Mutual Information Feature Selection

LI Erchao, ZHANG Baoxin, JIA Binbin   

  1. College of Electrical Engineering and Information Engineering, Lanzhou University of Technology, Lanzhou 730050, China
  • Online:2025-08-01 Published:2025-07-31

摘要: 现有多维分类的特征增强方法虽丰富了特征空间,但对特征内在质量缺乏有效评估,易引入冗余,影响分类性能。提出基于KNN特征增强与互信息特征选择的两阶段多维分类方法KMFM。第一阶段通过KNN特征增强扩展特征空间,第二阶段基于互信息评估并筛选相关性最强的特征子集,且通过计算类别空间组合熵考虑类别变量间的依赖关系。在10个基准数据集上的实验结果表明,KMFM在汉明分值、精确匹配和亚精确匹配指标上相比现有方法取得显著提升。在90种配置中,KMFM实现77.8%的最佳表现;与只采用特征增强的KRAM相比,性能提升显著;与只进行互信息特征选择MIFS相比,分类性能在9个指标上全面优越,充分说明了该算法的有效性和泛用性。

关键词: 多维分类, 特征增强, 特征选择, 互信息, 类依赖

Abstract: Although the existing feature enhancement methods for multi-dimensional classification enrich the feature space, they lack effective evaluation of the inherent quality of features and are prone to introducing redundancy, which affects classification performance. Therefore, a two-stage multi-dimensional classification method KMFM is proposed. In the first stage, the feature space is expanded through KNN feature enhancement. In the second stage, the feature subset with the strongest correlation is evaluated and selected based on mutual information, and the dependency relationship between category variables is considered by calculating the combination entropy of the category space. The experimental results on ten benchmark datasets show that KMFM achieves significant improvements in Hamming score, exact match, and sub-exact match metrics compared with existing methods. Among the 90 configurations, KMFM achieves the best performance in 77.8%. Compared with KRAM, which only uses feature enhancement, the performance of KMFM is significantly improved. Compared with MIFS, which only performs mutual information feature selection, the classification performance of KMFM is comprehensively superior in nine metrics, fully demonstrating the effectiveness and universality of this algorithm.

Key words: multi-dimensional classification, feature augmentation, feature selection, mutual information, class dependency