计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (4): 89-98.DOI: 10.3778/j.issn.1002-8331.2306-0230

• 理论与研发 • 上一篇    下一篇

采用动态相关度权重的特征选择算法

许华杰,刘冠霆,张品,秦远卓   

  1. 1. 广西大学  计算机与电子信息学院,南宁  530004
    2. 广西大学  广西多媒体通信与网络技术重点实验室,南宁  530004
    3. 北部湾港防城港码头有限公司,广西  防城港  538001
    4. 广西大学  土木建筑工程学院,南宁  530004
  • 出版日期:2024-02-15 发布日期:2024-02-15

Feature Selection Algorithm Using Dynamic Relevance Weight

XU Huajie, LIU Guanting, ZHANG Pin, QIN Yuanzhuo   

  1. 1. College of Computer and Electronic Information, Guangxi University, Nanning 530004, China
    2. Guangxi Key Laboratory of Multimedia Communications and Network Technology, Guangxi University, Nanning 530004, China
    3. Beibu Gulf Port Fangchenggang Terminal Co., Ltd., Fangchenggang, Guangxi 538001, China
    4. College of Civil Engineering and Architecture, Guangxi University, Nanning 530004, China
  • Online:2024-02-15 Published:2024-02-15

摘要: 基于互信息的特征选择算法在考虑候选特征提供的新分类信息时,通常忽略了候选特征的加入会使得已选特征和类标签的相关性发生变化而带来额外的新增信息量,以及在计算冗余信息时采用累加求和的形式可能导致低估候选特征的冗余程度。针对以上问题,提出动态相关度权重的定义,以更全面地考虑候选特征带来的新信息量成分;提出改进冗余项的定义,采用取最大值和归一化策略,以解决传统算法存在的低估冗余问题;在此基础上提出一种采用动态相关度权重的特征选择算法(feature selection using dynamic relevance weight,FSDRW)。选取五种当前主流的基于互信息的过滤式特征选择算法进行对比实验,在来自加州大学尔湾分校UCI和亚利桑那州立大学ASU的机器学习测试数据集上的实验表明,所提出的算法在分类准确率及综合性能方面具有较好的表现。最后将所提出算法应用于广西某水库工程的微震、爆破信号识别中,算法选取出的特征用于微震信号识别可达到98.86%的分类准确率,验证了算法在实际应用中的有效性。

关键词: 特征选择, 互信息, 信息熵, 动态相关度权重

Abstract: When considering the new classification information provided by the candidate features, the features selection algorithm based on mutual information usually ignores that the addition of the candidate features will cause the change of the correlation between the selected features and class labels, which will bring additional new information; in addition, when calculating redundant information, using the form of cumulative summation may lead to the underestimation of the redundancy degree of candidate features. In view of the above problems, the definition of dynamic relevance weight is proposed to more comprehensively consider the new information components brought by candidate features. The improved definition of redundant items is proposed, and the maximum and normalization strategy are adopted to solve the problem of underestimating redundancy. On this basis, the feature selection using dynamic relevance weight (FSDRW) is proposed. Five current mainstream filter-based feature selection algorithms based on mutual information are selected for comparative experiments. Experiments on machine learning test datasets from UCI (University of California Irvine) and ASU (Arizona State University) show that the proposed algorithm works well in classification accuracy and comprehensive performance. Finally, the proposed algorithm is applied to the recognition of microseismic and blasting signals in a reservoir project in Guangxi. The selected features of the algorithm are used for microseismic signal recognition, achieving a classification accuracy of 98.86%, verifying the effectiveness of the algorithm in practical applications.

Key words: feature selection, mutual information, information entropy, dynamic relevance weight