计算机工程与应用 ›› 2020, Vol. 56 ›› Issue (11): 51-59.DOI: 10.3778/j.issn.1002-8331.1907-0352

• 理论与研发 • 上一篇    下一篇

最大熵和[l2,0]范数约束的无监督特征选择算法

周婉莹,马盈仓,续秋霞,郑毅   

  1. 西安工程大学 理学院,西安 710600
  • 出版日期:2020-06-01 发布日期:2020-06-01

Unsupervised Feature Selection Algorithm Based on Maximum Entropy and [l2,0] Norm Constraints

ZHOU Wanying, MA Yingcang, XU Qiuxia, ZHENG Yi   

  1. School of Science, Xi’an Polytechnic University, Xi’an 710600, China
  • Online:2020-06-01 Published:2020-06-01

摘要:

无监督特征选择可以降低数据维数,提高算法的学习性能,是机器学习和模式识别等领域中的重要研究课题。和大多数在目标函数中引入稀疏正则化解决松弛问题的方法不同,提出了一种基于最大熵和[l2,0]范数约束的无监督特征选择算法。使用具有唯一确定含义的[l2,0]范数等式约束,即选择特征的数量,不涉及正则化参数的选取,避免调整参数。结合谱分析探索数据的局部几何结构并基于最大熵原理自适应的构造相似矩阵。通过增广拉格朗日函数法,设计了一种交替迭代优化算法对模型求解。在四个真实数据集上与其他几种无监督特征选择算法的对比实验,验证了所提算法的有效性。

关键词: 无监督特征选择, 范数约束, 最大熵, 增广拉格朗日

Abstract:

Unsupervised feature selection can reduce the dimension of data and improve the learning performance of algorithms. It is an important research topic in the fields of machine learning and pattern recognition. Different from most methods to solve relaxation problems by introducing sparse regularization into the objective function, an unsupervised feature selection algorithm based on maximum entropy and[l2,0] norm constraints is proposed. Firstly, [l2,0] norm equality constraint with unique definite meaning is used, i.e. the number of features is selected, which does not involve the selection of regularization parameters and avoids parameter adjustment. Secondly, combined with spectral analysis, the local geometric structure of the data is explored and the similarity matrix is adaptively constructed based on the maximum entropy principle. Finally, an alternative iterative optimization algorithm is designed to solve the model by augmented Lagrange function method. Compared with other unsupervised feature selection algorithms on four real data sets, the effectiveness of the proposed algorithm is verified.

Key words: unsupervised feature selection, norm constraint, maximum entropy, augmented Lagrange