Computer Engineering and Applications ›› 2022, Vol. 58 ›› Issue (22): 159-164.DOI: 10.3778/j.issn.1002-8331.2105-0076

• Pattern Recognition and Artificial Intelligence • Previous Articles     Next Articles

Three-Way Feature Selection Based on Neighborhood Mutual Information

ZHUO Yongtai, DONG Youming, GAO Can   

  1. 1.College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guangdong 518060, China
    2.Guangdong Key Laboratory of Intelligent Information Processing(Shenzhen University), Shenzhen, Guangdong 518060, China
  • Online:2022-11-15 Published:2022-11-15



  1. 1.深圳大学 计算机与软件学院,广东 深圳 518060
    2.广东省智能信息处理重点实验室(深圳大学),广东 深圳 518060

Abstract: Feature selection is a very important preprocessing step in machine learning, and neighborhood mutual information is an effective measure to deal with continuous or discrete features. However, existing feature selection methods based on neighborhood mutual information usually employ the heuristic greedy strategy to find the optimal reduct, so it is difficult to guarantee the quality of feature subsets. Based on the theory of three-way decision, a three-way neighborhood mutual information feature selection method(NMI-TWD) is proposed. Three potential candidate feature subsets with diversity are first generated in order to obtain feature subsets with higher quality. Then three-way co-decision model is developed to further improve the performance through ensembling the three obtained feature subsets. The extensive experiments conducted on UCI data sets show that the proposed method achieves better reducts and classification performance in comparison with other representative methods, validating its effectiveness.

Key words: feature selection, neighborhood rough set, neighborhood mutual information, three-way decision, ensemble learning

摘要: 特征选择是机器学习非常重要的预处理步骤,而邻域互信息是一种能直接处理连续型或离散型特征的有效方法。然而基于邻域互信息的特征选择方法一般采用启发式贪婪策略,其特征子集质量难以得到有效保证。基于三支决策的思想,提出了三支邻域互信息特征选择方法(NMI-TWD)。通过扩展三个潜在的候选特征子集,并保持各子集之间的差异性,以获得更高质量的特征子集。对三个差异性的特征子集进行集成学习,构建三支协同决策模型,以进一步提高分类学习性能。UCI实验数据表明,新方法的特征选择结果和分类性能较其他方法更优,说明了其有效性。

关键词: 特征选择, 邻域粗糙集, 邻域互信息, 三支决策, 集成学习