计算机工程与应用 ›› 2021, Vol. 57 ›› Issue (6): 58-66.DOI: 10.3778/j.issn.1002-8331.2003-0227

• 理论与研发 • 上一篇    下一篇

针对高维数据的马尔科夫毯特征选择

李静星,杨有龙   

  1. 西安电子科技大学 数学与统计学院,西安 710126
  • 出版日期:2021-03-15 发布日期:2021-03-12

Feature Selection of Markov Blanket for High Dimensional Data

LI Jingxing, YANG Youlong   

  1. School of Mathematics and Statistics, Xidian University, Xi’an 710126, China
  • Online:2021-03-15 Published:2021-03-12

摘要:

针对不满足忠实分布的高维数据分类问题,一种新的基于粒子群算法的马尔科夫毯特征选择方法被提出。它通过有效地提取相关特征和剔除冗余特征,能够产生更好的分类结果。在特征预处理阶段,该算法通过最大信息系数衡量标准对特征的相关度和冗余性进行分析得到类属性的马尔科夫毯代表集和次最优特征子集;在搜索评价阶段,采用新的适应度函数通过粒子群算法选出最优特征子集;用此模型对测试集进行预测。实验结果表明,该算法在12个数据集上具有一定的优势。

关键词: 特征选择, 最大信息系数, 马尔科夫毯代表集, 次最优特征子集, 适应度函数, 粒子群算法

Abstract:

For the problem of high dimensional data classification that does not satisfy the faithful distribution, a new Markov blanket feature selection algorithm based particle swarm optimization is proposed. By effectively extracting relevant features and eliminating redundant features, it can produce better classification results. In the feature preprocessing stage, this algorithm analyzes the correlation and redundancy of features by using the maximum information coefficient measurement standard to obtain the Markov blanket representative set and sub-optimal feature subset of class attributes. In the search and evaluation stage, a new fitness function is used to select the optimal feature subset by particle swarm optimization. The model is used to predict the testing set. Experimental results show that the algorithm has certain advantages on twelve datasets.

Key words: feature selection, maximal information coefficient, Markov blanket representative set, suboptimal feature subset, fitness function, particle swarm optimization