结合L1和L2正则化约束的隐语义预测模型研究

doi:10.3778/j.issn.1002-8331.1807-0140

计算机工程与应用 ›› 2019, Vol. 55 ›› Issue (19): 121-127.DOI: 10.3778/j.issn.1002-8331.1807-0140

结合L1和L2正则化约束的隐语义预测模型研究

王德贤，何先波，贺春林，周坤，陈敏治

西华师范大学计算机学院，四川南充 637000

出版日期:2019-10-01 发布日期:2019-09-30

Latent Factor Prediction Model Combining L1 and L2 Regularization Constraints

WANG Dexian, HE Xianbo, HE Chunlin, ZHOU Kun, CHEN Minzhi

School of Computer Science, China West Normal University, Nanchong, Sichuan 637000 China

Online:2019-10-01 Published:2019-09-30

摘要/Abstract

摘要： 在大数据领域中预测高维稀疏矩阵中的缺失数据，通常采用随机梯度下降算法构造隐语义模型来对缺失数据进行预测。在随机梯度下降算法来求解模型的过程中经常加入正则化项来提高模型的性能，由于[L1]正则化项不可导，目前在隐语义模型中主要通过加入[L2]正则化项来构建隐语义模型（SGD_LF）。但因为[L1]正则化项能提高模型的稀疏性增强模型求解能力，因此提出一种基于[L1]和[L2]正则化约束的隐语义（SPGD_LF）模型。在通过构建目标函数时，同时引入[L1]和[L2]正则化项。由于目标函数满足利普希茨条件，并通过二阶的泰勒展开对目标函数进行逼近，构造出随机梯度下降的求解器，在随机梯度下降求解隐语义模型的过程中通过软阈值来处理[L1]正则化项所对应的边界优化问题。通过此优化方案，可以更好地表达目标矩阵中的已知数据在隐语义空间中的特征和对应的所属社区关系，提高了模型的泛化能力。通过在大型工业数据集上的实验表明，SPGD_LF模型的预测精度、稀疏性和收敛速度等性能都有显著提高。

关键词: 大数据应用, 高维稀疏矩阵, 隐语义

Abstract: LF model is usually built by SGD method and it’s used to predict the missing data of high-dimensional sparse matrix in big data field. LF model need to integrate regularization terms to improve its performance. Due to [L1] regularization term is non-differentiable, normally integrates [L2] regularization term into an LF model only. However, the [L1] regularization normal can improve the sparsity and solving ability of LF model. To solve the issue, this paper proposes a SPGD_LF model that simultaneously integrates both [L1] and [L2] regularization terms in to an LF model. Since the objective function satisfies the Lipschitz condition and approximates the objective function by second-order Taylor expansion, a solver for stochastic gradient descent is constructed. In the process of stochastic gradient descent, the soft threshold process deals with the boundary optimization problem corresponding to the [L1] regularization term and solves the implicit semantic model. Through this optimization scheme, the characteristics of the known data in the target matrix in the latent factor space and the corresponding community relationship can be better expressed, and the generalization ability of the model is improved. Empirical studies on two datasets from industrial applications and the results show that the prediction accuracy, sparsity and convergence rate of SPGD_LF model are improved significantly.

Key words: big data application, high-dimensional and sparse matrix, latent factor

王德贤，何先波，贺春林，周坤，陈敏治. 结合L1和L2正则化约束的隐语义预测模型研究[J]. 计算机工程与应用, 2019, 55(19): 121-127.

WANG Dexian, HE Xianbo, HE Chunlin, ZHOU Kun, CHEN Minzhi. Latent Factor Prediction Model Combining L1 and L2 Regularization Constraints[J]. Computer Engineering and Applications, 2019, 55(19): 121-127.

[1]	程晓娜，孙志锋. 隐式反馈场景下的LFM-XGB-LR融合推荐算法[J]. 计算机工程与应用, 2020, 56(5): 85-92.
[2]	王永康，袁卫华，张志军，温鹏. 融合时间隐语义填充和子群划分的推荐算法[J]. 计算机工程与应用, 2019, 55(16): 130-137.
[3]	鲁权，王如龙，张锦，丁怡. 融合邻域模型与隐语义模型的推荐算法[J]. 计算机工程与应用, 2013, 49(19): 100-103.

结合L1和L2正则化约束的隐语义预测模型研究

Latent Factor Prediction Model Combining L1 and L2 Regularization Constraints

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 3

编辑推荐

Metrics