计算机工程与应用 ›› 2019, Vol. 55 ›› Issue (3): 10-14.DOI: 10.3778/j.issn.1002-8331.1808-0300

• 热点与综述 • 上一篇    下一篇

稀疏结构化最小二乘双支持向量回归机

闫丽萍1,马家军1,陈文兴2   

  1. 1.西安电子科技大学 数学与统计学院,西安 710126
    2.宁夏大学 数学统计学院,银川 750021
  • 出版日期:2019-02-01 发布日期:2019-01-24

Sparse Structured Least Squares Twin Support Vector Regression Machine

YAN Liping1, MA Jiajun1, CHEN Wenxing2   

  1. 1.School of Mathematics and Statistics, Xidian University, Xi’an 710126, China
    2.School of Mathematics and Statistics, Ningxia University, Yinchuan 750021, China
  • Online:2019-02-01 Published:2019-01-24

摘要: 最小二乘双支持向量回归机(LSTSVR)通过引入最小二乘损失将双支持向量回归机(TSVR)中的二次规划问题简化为两个线性方程组的求解,从而大大减少了训练时间。然而,LSTSVR最小化基于最小二乘损失的经验风险易导致以下不足:(1)“过学习”问题;(2)模型的解缺乏稀疏性,难以训练大规模数据。针对(1),提出结构化最小二乘双支持向量回归机(S-LSTSVR)以提升模型的泛化能力;针对(2),进一步利用不完全Choesky分解对核矩阵进行低秩近似,给出求解S-LSTSVR的稀疏算法SS-LSTSVR,使模型能有效地训练大规模数据。人工数据和UCI数据集中的实验证明SS-LSTSVR不但可以避免“过学习”,而且能够高效地解决大规模训练问题。

关键词: 最小二乘双支持向量回归, 结构风险最小化, 稀疏性, 不完全Choesky分解, 大规模

Abstract: The Least Squares Twin Support Vector Regression(LSTSVR) machine simplifies the quadratic programming problem in the Twin Support Vector Regression(TSVR) machine to the solution of two linear equations by introducing the least squares loss, thus greatly reducing the training time. However, LSTSVR minimizes the empirical risk based on least squares loss, which will lead to the following shortcomings:(1)the problem of “over-learning”; (2)the solution of model lacks sparsity and it is difficult to train large-scale data. For(1), the Structured Least Squares Twin Support Vector Regression(S-LSTSVR) is given to improve the generalization ability of the model. For(2), the low rank approximation is carried out to the kernel matrix by using incomplete Choesky decomposition, and an sparse algorithm is given for solving S-LSTSVR model(SS-LSTSVR), which makes the model train large-scale data effectively. Experiments on artificial data and UCI data sets show that SS-LSTSVR can avoid “over learning” and can solve large-scale training problems efficiently.

Key words: Least Squares Twin Support Vector Regression(LSTSVR), structural risk minimization, sparsity, incomplete Choesky decomposition, large-scale