基于Schatten-p范数和特征自表示的无监督特征选择

doi:10.3778/j.issn.1002-8331.2006-0323

计算机工程与应用 ›› 2020, Vol. 56 ›› Issue (23): 45-52.DOI: 10.3778/j.issn.1002-8331.2006-0323

基于Schatten-p范数和特征自表示的无监督特征选择

彭明，张海澎

1.龙岩学院数学与信息工程学院，福建龙岩 364012
2.国网山西省电力公司沁源县供电公司，山西长治 046500

出版日期:2020-12-01 发布日期:2020-11-30

Unsupervised Feature Selection via Schatten-p Norm and Feature Self-Representation

PENG Ming, ZHANG Haipeng

1.College of Mathematics and Information Engineering, Longyan University, Longyan, Fujian 364012, China
2.State Grid Qinyuan Power Supply Company of Shanxi Electric Power Company, Changzhi, Shanxi 046500, China

Online:2020-12-01 Published:2020-11-30

摘要/Abstract

摘要：

特征选择是去除不相关和冗余特征，找到具有良好泛化能力的原始特征的紧凑表示，同时，数据中含有的噪声和离群点会使学习获得的系数矩阵的秩变大，使得算法无法捕捉到高维数据中真实的低秩结构。因此，利用Schatten-p范数逼近秩最小化问题和特征自表示重构无监督特征选择问题中的系数矩阵，建立一个基于Schatten-p范数和特征自表示的无监督特征选择（SPSR）算法，并使用增广拉格朗日乘子法和交替方向法乘子法框架进行求解。最后在6个公开数据集上与经典无监督特征选择算法进行实验比较，SPSR算法的聚类精度更高，可以有效地识别代表性特征子集。

关键词: 特征选择, 无监督学习, Schatten-p范数, 特征自表示

Abstract:

Feature selection is to remove the irrelevant and redundant features which aims to find a compact representation of the original features with good generalization ability. Meanwhile, the noise and outliers inhered in data always make the rank of affinity matrix bigger, and result in the learned algorithm cannot catch the truth low rank structure of data. Thus, this paper proposes an unsupervised feature selection algorithm based on Schatten-p norm and feature self-representation（SPSR）, which uses Schatten-p norm to approximate rank minimization problem and feature self-representation to reconstruct affinity matrix of the unsupervised feature selection problem. Furthermore, the SPSR algorithm is solved to select an effective feature subset by using the augmented Lagrangian multipliers and alternating direction multipliers. Finally, compared with several state-of-the-art feature selection methods on six publicly available datasets, SPSR has higher clustering accuracy and effectively identifies the representative feature subset.

Key words: feature selection, unsupervised learning, Schatten-p norm, feature self-representation

彭明，张海澎. 基于Schatten-p范数和特征自表示的无监督特征选择[J]. 计算机工程与应用, 2020, 56(23): 45-52.

PENG Ming, ZHANG Haipeng. Unsupervised Feature Selection via Schatten-p Norm and Feature Self-Representation[J]. Computer Engineering and Applications, 2020, 56(23): 45-52.

[1]	李莉，纪欣沅，宋嵩. 回环软件缺陷数量预测模型[J]. 计算机工程与应用, 2021, 57(7): 158-163.
[2]	李静星，杨有龙. 针对高维数据的马尔科夫毯特征选择[J]. 计算机工程与应用, 2021, 57(6): 58-66.
[3]	林炜星，王宇嘉，陈万芬，梁海娜. 基于多因子粒子群的高维数据特征选择算法[J]. 计算机工程与应用, 2021, 57(22): 199-207.
[4]	李珑珠，林耀进，吕彦，卢舜，王晨曦. 利用邻域信息交互的在线流特征选择算法[J]. 计算机工程与应用, 2021, 57(21): 102-108.
[5]	王长城，周冬明，刘琰煜，谢诗冬. 无监督深度学习模型的多聚焦图像融合算法[J]. 计算机工程与应用, 2021, 57(21): 209-215.
[6]	陈倩茹，李雅丽，许科全，刘铱龙，王淑琴. 自调优自适应遗传算法的WKNN特征选择方法[J]. 计算机工程与应用, 2021, 57(20): 164-171.
[7]	武炜杰，张景祥. 融合分类信息的随机森林特征选择算法及应用[J]. 计算机工程与应用, 2021, 57(17): 147-156.
[8]	郭艳芬，崔喆，杨智鹏，彭静，胡金蓉. 基于深度学习的医学图像配准技术研究进展[J]. 计算机工程与应用, 2021, 57(15): 1-8.
[9]	邱云飞，高华聪. 混合Filter与改进自适应GA的特征选择方法[J]. 计算机工程与应用, 2021, 57(11): 95-102.
[10]	霍林，陆寅丽. 改进粒子群算法应用于Android恶意应用检测[J]. 计算机工程与应用, 2020, 56(7): 96-101.
[11]	廖文雄，曾碧，梁天恺，徐雅芸，赵俊峰. 面向高维数据的个人信贷风险评估方法[J]. 计算机工程与应用, 2020, 56(4): 219-224.
[12]	刘峰，Godfred Kim Mensah，李欣芸，刘鸿丽，李瑶，郭浩. 不确定脑网络的异常拓扑分析及分类研究[J]. 计算机工程与应用, 2020, 56(2): 127-132.
[13]	岳鹏，侯凌燕，杨大利，佟强. 基于XGBoost特征选择的疾病诊断XLC-Stacking方法[J]. 计算机工程与应用, 2020, 56(17): 136-141.
[14]	陈媛，陈晓云. 流形极限学习机自编码特征表示[J]. 计算机工程与应用, 2020, 56(17): 150-155.
[15]	黄欣，莫海淼，赵志刚，曾敏. 离散型增强烟花算法和[kNN]在特征选择中的研究[J]. 计算机工程与应用, 2020, 56(16): 112-117.

基于Schatten-p范数和特征自表示的无监督特征选择

Unsupervised Feature Selection via Schatten-p Norm and Feature Self-Representation

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics