基于归一化模糊联合互信息最大的特征选择

doi:10.3778/j.issn.1002-8331.1605-0293

计算机工程与应用 ›› 2017, Vol. 53 ›› Issue (22): 105-110.DOI: 10.3778/j.issn.1002-8331.1605-0293

基于归一化模糊联合互信息最大的特征选择

董泽民1，石强2

1.武汉科技大学城市学院实验实训中心，武汉 430083
2.华中科技大学软件学院，武汉 430000

出版日期:2017-11-15 发布日期:2017-11-29

Feature selection using normalized fuzzy joint mutual information maximum

DONG Zemin1, SHI Qiang2

1.Research and Training Center of City College, Wuhan University of Science and Technology, Wuhan 430083, China
2.School of Software Engineering, Huazhong University of Science & Technology, Wuhan 430000, China

Online:2017-11-15 Published:2017-11-29

摘要/Abstract

摘要： 特征选择就是从特征集合中选择出与分类类别相关性强而特征之间冗余性最小的特征子集，这样一方面可以提高分类器的计算效率，另一方面可以提高分类器的泛化能力，进而提高分类精度。基于互信息的特征相关性和冗余性的评价准则，在实际应用中存在以下的问题：（1）变量的概率计算困难，进而影响特征的信息熵计算困难；（2）互信息倾向于选择值较多的特征；（3）基于累积加和的候选特征与特征子集之间冗余性度量准则在特征维数较高的情况下容易失效。为了解决上述问题，提出了基于归一化模糊互信息最大的特征评价准则，基于模糊等价关系计算变量的信息熵、条件熵、联合熵；利用联合互信息最大替换累积加和的度量方法；基于归一化联合互信息对特征重要性进行评价；基于该准则建立了基于前向贪婪搜索的特征选择算法。在UCI机器学习标准数据集上的多组实验，证明算法能够有效地选择出对分类类别有效的特征子集，能够明显提高分类精度。

关键词: 模糊等价关系, 联合互信息, 最大最小准则, 特征选择

Abstract: Feature selection is the method that selects feature subset that has strong relevancy between features and classification and smallest redundancy among features from feature set. This can improve the classifier’s computational efficiency, and enhance the classifier’s generalization, and therefore increase classification accuracy. However, the relevance and redundancy evaluation criteria based on mutual information has the following problems in the practical applications：（1） It is difficult to calculate the probability of a variable and the feature’s information entropy; （2） The approach based on mutual information tends to choose features which have more values; （3） The method measuring redundancy between candidate features and selected feature subset based on cumulative addition with higher dimension data sets always is invalid. To solve the above problems, the feature evaluation criteria based on Normalized Fuzzy Joint Mutual Information Maximum（NFJMIM） is proposed in this paper. Firstly, the entropy, conditional entropy, joint entropy of a variable are calculated based on fuzzy equivalence relation. Secondly, the feature’s importance is evaluated base on NFJMIM. Finally, using the established criteria, forward greedy search approach is used for searching feature subset. Several experiments using UCI machine learning repository prove that the proposed algorithm can effectively select effective feature subset, and can significantly improve the classification accuracy.

Key words: fuzzy equivalence relations, joint mutual information, the maximum and minimum criteria, feature selection

董泽民1，石强2. 基于归一化模糊联合互信息最大的特征选择[J]. 计算机工程与应用, 2017, 53(22): 105-110.

DONG Zemin1, SHI Qiang2. Feature selection using normalized fuzzy joint mutual information maximum[J]. Computer Engineering and Applications, 2017, 53(22): 105-110.

[1]	李莉，纪欣沅，宋嵩. 回环软件缺陷数量预测模型[J]. 计算机工程与应用, 2021, 57(7): 158-163.
[2]	李静星，杨有龙. 针对高维数据的马尔科夫毯特征选择[J]. 计算机工程与应用, 2021, 57(6): 58-66.
[3]	林炜星，王宇嘉，陈万芬，梁海娜. 基于多因子粒子群的高维数据特征选择算法[J]. 计算机工程与应用, 2021, 57(22): 199-207.
[4]	李珑珠，林耀进，吕彦，卢舜，王晨曦. 利用邻域信息交互的在线流特征选择算法[J]. 计算机工程与应用, 2021, 57(21): 102-108.
[5]	陈倩茹，李雅丽，许科全，刘铱龙，王淑琴. 自调优自适应遗传算法的WKNN特征选择方法[J]. 计算机工程与应用, 2021, 57(20): 164-171.
[6]	武炜杰，张景祥. 融合分类信息的随机森林特征选择算法及应用[J]. 计算机工程与应用, 2021, 57(17): 147-156.
[7]	邱云飞，高华聪. 混合Filter与改进自适应GA的特征选择方法[J]. 计算机工程与应用, 2021, 57(11): 95-102.
[8]	霍林，陆寅丽. 改进粒子群算法应用于Android恶意应用检测[J]. 计算机工程与应用, 2020, 56(7): 96-101.
[9]	廖文雄，曾碧，梁天恺，徐雅芸，赵俊峰. 面向高维数据的个人信贷风险评估方法[J]. 计算机工程与应用, 2020, 56(4): 219-224.
[10]	彭明，张海澎. 基于Schatten-p范数和特征自表示的无监督特征选择[J]. 计算机工程与应用, 2020, 56(23): 45-52.
[11]	刘峰，Godfred Kim Mensah，李欣芸，刘鸿丽，李瑶，郭浩. 不确定脑网络的异常拓扑分析及分类研究[J]. 计算机工程与应用, 2020, 56(2): 127-132.
[12]	岳鹏，侯凌燕，杨大利，佟强. 基于XGBoost特征选择的疾病诊断XLC-Stacking方法[J]. 计算机工程与应用, 2020, 56(17): 136-141.
[13]	黄欣，莫海淼，赵志刚，曾敏. 离散型增强烟花算法和[kNN]在特征选择中的研究[J]. 计算机工程与应用, 2020, 56(16): 112-117.
[14]	周婉莹，马盈仓，续秋霞，郑毅. 最大熵和[l2,0]范数约束的无监督特征选择算法[J]. 计算机工程与应用, 2020, 56(11): 51-59.
[15]	郭磊，王顺芳. 序列信息融合与两阶段特征选择的膜蛋白预测[J]. 计算机工程与应用, 2019, 55(6): 145-150.

基于归一化模糊联合互信息最大的特征选择

Feature selection using normalized fuzzy joint mutual information maximum

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics