基于最近邻互信息的特征选择算法

计算机工程与应用 ›› 2016, Vol. 52 ›› Issue (18): 74-78.

基于最近邻互信息的特征选择算法

王晨曦1，林耀进2，刘景华2，林梦雷2

1.漳州职业技术学院计算机工程系，福建漳州 363000
2.闽南师范大学计算机学院，福建漳州 363000

出版日期:2016-09-15 发布日期:2016-09-14

Feature selection algorithm based on nearest-neighbor mutual information

WANG Chenxi1, LIN Yaojin2, LIU Jinghua2, LIN Menglei2

1.Department of Computer Engineering, Zhangzhou Institute of Technology, Zhangzhou, Fujian 363000, China
2.School of Computer Science, Minnan Normal University, Zhangzhou, Fujian 363000, China

Online:2016-09-15 Published:2016-09-14

摘要/Abstract

摘要： 针对邻域信息系统的特征选择模型存在人为设定邻域参数值的问题。分别计算样本与最近同类样本和最近异类样本的距离，用于定义样本的最近邻以确定信息粒子的大小。将最近邻的概念扩展到信息理论，提出最近邻互信息。在此基础上，采用前向贪心搜索策略构造了基于最近邻互信息的特征算法。在两个不同基分类器和八个UCI数据集上进行实验。实验结果表明：相比当前多种流行算法，该模型能够以较少的特征获得较高的分类性能。

关键词: 特征选择, 最近邻, 互信息, 邻域互信息

Abstract: Feature selection of neighborhood information system is constrained by the neighborhood size. First, this paper calculates the distance between a given sample and its nearest samples with the same and different labels to define the concept of nearest-neighbor, and determines the size of nearest neighbor simultaneously. Second, the notion of nearest-neighbor is extended to Shannon information theory, and the concept of nearest neighbor mutual information is presented. Then, a forward greedy strategy is used to construct feature selection algorithm based on nearest-neighbor mutual information. Finally, experiments are conducted on eight UCI data sets and two different base classifiers. Experimental results show that the proposed algorithm selects a few features and effectively improves classification performance compared?with other
popular algorithms.

Key words: feature selection, nearest-neighbor, mutual information, neighborhood mutual information

王晨曦1，林耀进2，刘景华2，林梦雷2. 基于最近邻互信息的特征选择算法[J]. 计算机工程与应用, 2016, 52(18): 74-78.

WANG Chenxi1, LIN Yaojin2, LIU Jinghua2, LIN Menglei2. Feature selection algorithm based on nearest-neighbor mutual information[J]. Computer Engineering and Applications, 2016, 52(18): 74-78.

[1]	李俊丽. Spark平台下类别数据互信息计算的并行化[J]. 计算机工程与应用, 2021, 57(7): 95-100.
[2]	李莉，纪欣沅，宋嵩. 回环软件缺陷数量预测模型[J]. 计算机工程与应用, 2021, 57(7): 158-163.
[3]	李静星，杨有龙. 针对高维数据的马尔科夫毯特征选择[J]. 计算机工程与应用, 2021, 57(6): 58-66.
[4]	雷恒林，古兰拜尔·吐尔洪，买日旦·吾守尔，张东梅. 新奇检测综述[J]. 计算机工程与应用, 2021, 57(5): 47-55.
[5]	林炜星，王宇嘉，陈万芬，梁海娜. 基于多因子粒子群的高维数据特征选择算法[J]. 计算机工程与应用, 2021, 57(22): 199-207.
[6]	李珑珠，林耀进，吕彦，卢舜，王晨曦. 利用邻域信息交互的在线流特征选择算法[J]. 计算机工程与应用, 2021, 57(21): 102-108.
[7]	陈倩茹，李雅丽，许科全，刘铱龙，王淑琴. 自调优自适应遗传算法的WKNN特征选择方法[J]. 计算机工程与应用, 2021, 57(20): 164-171.
[8]	孟东霞，李玉鑑. 利用自然最近邻的不平衡数据过采样方法[J]. 计算机工程与应用, 2021, 57(2): 91-96.
[9]	王永，赵旭辉，李晓光，肖玲. 一种面向协同过滤的快速最近邻居搜索方法[J]. 计算机工程与应用, 2021, 57(17): 96-105.
[10]	武炜杰，张景祥. 融合分类信息的随机森林特征选择算法及应用[J]. 计算机工程与应用, 2021, 57(17): 147-156.
[11]	邱云飞，高华聪. 混合Filter与改进自适应GA的特征选择方法[J]. 计算机工程与应用, 2021, 57(11): 95-102.
[12]	霍林，陆寅丽. 改进粒子群算法应用于Android恶意应用检测[J]. 计算机工程与应用, 2020, 56(7): 96-101.
[13]	陈建促，王越，朱小飞，李章宇，林志航. 融合多特征图的野生动物视频目标检测方法[J]. 计算机工程与应用, 2020, 56(7): 221-227.
[14]	廖文雄，曾碧，梁天恺，徐雅芸，赵俊峰. 面向高维数据的个人信贷风险评估方法[J]. 计算机工程与应用, 2020, 56(4): 219-224.
[15]	彭明，张海澎. 基于Schatten-p范数和特征自表示的无监督特征选择[J]. 计算机工程与应用, 2020, 56(23): 45-52.