计算机工程与应用 ›› 2017, Vol. 53 ›› Issue (20): 116-121.DOI: 10.3778/j.issn.1002-8331.1604-0366

• 模式识别与人工智能 • 上一篇    下一篇

低秩特征选择多输出回归算法

杨利锋1,2,林大华3,邓振云1,2,李永钢1,2   

  1. 1.广西多源信息挖掘与安全重点实验室,广西 桂林 541004
    2.广西区域多源信息集成与智能处理协同创新中心,广西 桂林 541004
    3.广西电化教育馆,南宁 530021
  • 出版日期:2017-10-15 发布日期:2017-10-31

Low-rank feature selection for multiple-output regression algorithm

YANG Lifeng1,2, LIN Dahua3, DENG Zhenyun1,2, LI Yonggang1,2   

  1. 1.Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin, Guangxi 541004, China
    2.Guangxi Collaborative Innovation Center of Multi-source Information Integration and Intelligent Processing, Guilin, Guangxi 541004, China
    3.Guangxi Center for Educational Technology, Nanning 530021, China
  • Online:2017-10-15 Published:2017-10-31

摘要: 针对现有回归算法没有考虑利用特征与输出的关系,各输出之间的关系,以及样本之间的关系来处理高维数据的多输出回归问题易输出不稳定的模型,提出一种新的低秩特征选择多输出回归方法。该方法采用低秩约束去构建低秩回归模型来获取多输出变量之间的关联结构;同时创新地在该低秩回归模型上使用[L2,p]-范数来进行样本选择,合理地去除噪音和离群点的干扰;并且使用[L2,p]-范数正则化项惩罚回归系数矩阵进行特征选择,有效地处理特征与输出的关系和避免“维灾难”的影响。通过实际数据集的实验结果表明,提出的方法在处理高维数据的多输出回归分析中能获得非常好的效果。

关键词: 多输出回归, 低秩回归, 回归系数矩阵, 特征选择

Abstract: To solve the issue of the existing regression models do not well take advantage of the correlation between inputs and outputs, and among outputs, also between samples, to take the multiple output regression analysis for high-dimensional data, it proposes a novel multiple output regression method called Low-rank Feature Selection for Multiple-output Regression algorithm(for short LFS_MR). The method can catch the correlation structures of outputs via a low-rank regression model with a low-rank constraint. Specially, it is innovative that the method conducts sample selection via an [L2,p]-norm on this low-rank regression model, which can avoid the interference of noise and outliers reasonably. What’s more, the method conducts feature selection by applying an [L2,p]-norm regularization term to penalty the regression coefficient matrix, which handles with the correlations between inputs and outputs efficiently, and solves the problem of curse of dimensionality for the high-dimensional data. The experimental results on many realistic datasets show that the proposed method can obtain very good results when conduct a multiple output regression analysis for high-dimensional data.

Key words: multiple-output regression, low-rank regression, regression coefficient matrix, feature selection