计算机工程与应用 ›› 2018, Vol. 54 ›› Issue (4): 135-140.DOI: 10.3778/j.issn.1002-8331.1609-0026

• 模式识别与人工智能 • 上一篇    下一篇

一种利用Screening加速技巧的Lasso算法

邱俊洋,潘志松,易  磊,陶  蔚,张梁梁   

  1. 中国人民解放军理工大学 指挥信息系统学院,南京 210007
  • 出版日期:2018-02-15 发布日期:2018-03-07

Lasso algorithm using Screening accelerated technique

QIU Junyang, PAN Zhisong, YI Lei, TAO Wei, ZHANG Liangliang   

  1. College of Command Information System, PLA University of Science and Technology, Nanjing 210007, China
  • Online:2018-02-15 Published:2018-03-07

摘要: Lasso(Least absolute shrinkage and selection operator)是目前广为应用的一种稀疏特征选择算法。经典的Lasso算法通过对高维数据进行特征选择一定程度上降低了计算开销,然而,求解Lasso问题目前仍面临诸多困难与挑战,例如当特征维数和样本数量非常大时,甚至无法将数据矩阵加载到主存储器中。为了应对这一挑战,Screening加速技巧成为近年来研究的热点。Screening可以在问题优化求解之前将稀疏优化结果中系数必然为0的无效特征筛选出来并剔除,从而极大地降低数据维度,在不损失问题求解精度的前提下,加速稀疏优化问题的求解速度。首先推导了Lasso的对偶问题,根据对偶问题的特性得出基于对偶多面投影的Screening加速技巧,最后将Screening加速技巧引入Lasso特征选择算法,并在多个高维数据集上进行实验,通过加速比、识别率以及算法运行时间三个指标验证了Screening加速技巧在Lasso算法上的良好性能。

关键词: Lasso算法, Screening加速技巧, 稀疏特征选择, 高维数据

Abstract: Recently, Lasso is one of the most popular algorithms for sparse feature selection in the application. Traditional Lasso algorithm reduces the computational overhead to a certain extent through feature selection. However, sometimes it may become impossible to load the data matrix into the main memory of the computer when the feature dimension or the number of samples is extremely large. Thus it can be seen that there are still many difficulties and challenges in solving Lasso algorithm. To address these challenges, the Screening accelerating technique has been proposed and becomes the research focus in recent years. The Screening technique is able to quickly identify and remove the invalid features before the optimization process. Then it only needs to deal with the reduced data, which can greatly accelerate the solving efficiency. The most important is that the Screening technique can guarantee that the discarded features have zero coefficients in the solution vector so as to reduce the data dimension as well as maintain the solution accuracy. This paper firstly derives the dual problem of Lasso, then introduces the Screening accelerated technique according to the properties of the dual problem. Besides, it evaluates the performance of the algorithms under several high dimensional datasets using three metrics, speedup ratio, detection rate and running time. Results show that the Screening accelerated technique is effective in solving efficiency of Lasso.

Key words: Least absolute shrinkage and selection operator(Lasso), Screening accelerating technique, sparse feature selection, high-dimensional data