计算机工程与应用 ›› 2016, Vol. 52 ›› Issue (18): 74-78.

• 大数据与云计算 • 上一篇    下一篇

基于最近邻互信息的特征选择算法

王晨曦1,林耀进2,刘景华2,林梦雷2   

  1. 1.漳州职业技术学院 计算机工程系,福建 漳州 363000
    2.闽南师范大学 计算机学院,福建 漳州 363000
  • 出版日期:2016-09-15 发布日期:2016-09-14

Feature selection algorithm based on nearest-neighbor mutual information

WANG Chenxi1, LIN Yaojin2, LIU Jinghua2, LIN Menglei2   

  1. 1.Department of Computer Engineering, Zhangzhou Institute of Technology, Zhangzhou, Fujian 363000, China
    2.School of Computer Science, Minnan Normal University, Zhangzhou, Fujian 363000, China
  • Online:2016-09-15 Published:2016-09-14

摘要: 针对邻域信息系统的特征选择模型存在人为设定邻域参数值的问题。分别计算样本与最近同类样本和最近异类样本的距离,用于定义样本的最近邻以确定信息粒子的大小。将最近邻的概念扩展到信息理论,提出最近邻互信息。在此基础上,采用前向贪心搜索策略构造了基于最近邻互信息的特征算法。在两个不同基分类器和八个UCI数据集上进行实验。实验结果表明:相比当前多种流行算法,该模型能够以较少的特征获得较高的分类性能。

关键词: 特征选择, 最近邻, 互信息, 邻域互信息

Abstract: Feature selection of neighborhood information system is constrained by the neighborhood size. First, this paper calculates the distance between a given sample and its nearest samples with the same and different labels to define the concept of nearest-neighbor, and determines the size of nearest neighbor simultaneously. Second, the notion of nearest-neighbor is extended to Shannon information theory, and the concept of nearest neighbor mutual information is presented. Then, a forward greedy strategy is used to construct feature selection algorithm based on nearest-neighbor mutual information. Finally, experiments are conducted on eight UCI data sets and two different base classifiers. Experimental results show that the proposed algorithm selects a few features and effectively improves classification performance compared?with other
popular algorithms.

Key words: feature selection, nearest-neighbor, mutual information, neighborhood mutual information