Computer Engineering and Applications ›› 2020, Vol. 56 ›› Issue (23): 45-52.DOI: 10.3778/j.issn.1002-8331.2006-0323

Previous Articles     Next Articles

Unsupervised Feature Selection via Schatten-p Norm and Feature Self-Representation

PENG Ming, ZHANG Haipeng   

  1. 1.College of Mathematics and Information Engineering, Longyan University, Longyan, Fujian 364012, China
    2.State Grid Qinyuan Power Supply Company of Shanxi Electric Power Company, Changzhi, Shanxi 046500, China
  • Online:2020-12-01 Published:2020-11-30

基于Schatten-p范数和特征自表示的无监督特征选择

彭明,张海澎   

  1. 1.龙岩学院 数学与信息工程学院,福建 龙岩 364012
    2.国网山西省电力公司 沁源县供电公司,山西 长治 046500

Abstract:

Feature selection is to remove the irrelevant and redundant features which aims to find a compact representation of the original features with good generalization ability. Meanwhile, the noise and outliers inhered in data always make the rank of affinity matrix bigger, and result in the learned algorithm cannot catch the truth low rank structure of data. Thus, this paper proposes an unsupervised feature selection algorithm based on Schatten-p norm and feature self-representation(SPSR), which uses Schatten-p norm to approximate rank minimization problem and feature self-representation to reconstruct affinity matrix of the unsupervised feature selection problem. Furthermore, the SPSR algorithm is solved to select an effective feature subset by using the augmented Lagrangian multipliers and alternating direction multipliers. Finally, compared with several state-of-the-art feature selection methods on six publicly available datasets, SPSR has higher clustering accuracy and effectively identifies the representative feature subset.

Key words: feature selection, unsupervised learning, Schatten-p norm, feature self-representation

摘要:

特征选择是去除不相关和冗余特征,找到具有良好泛化能力的原始特征的紧凑表示,同时,数据中含有的噪声和离群点会使学习获得的系数矩阵的秩变大,使得算法无法捕捉到高维数据中真实的低秩结构。因此,利用Schatten-p范数逼近秩最小化问题和特征自表示重构无监督特征选择问题中的系数矩阵,建立一个基于Schatten-p范数和特征自表示的无监督特征选择(SPSR)算法,并使用增广拉格朗日乘子法和交替方向法乘子法框架进行求解。最后在6个公开数据集上与经典无监督特征选择算法进行实验比较,SPSR算法的聚类精度更高,可以有效地识别代表性特征子集。

关键词: 特征选择, 无监督学习, Schatten-p范数, 特征自表示