计算机工程与应用 ›› 2018, Vol. 54 ›› Issue (14): 71-76.DOI: 10.3778/j.issn.1002-8331.1711-0152

• 大数据与云计算 • 上一篇    下一篇

基于Spark的并行化协同深度推荐模型

贾晓光   

  1. 燕山大学 信息科学与工程学院,河北 秦皇岛 066004
  • 出版日期:2018-07-15 发布日期:2018-08-06

Parallel collaborative depth recommendation model based on Spark

JIA Xiaoguang   

  1. School of Information Science and Engineering, Yanshan University, Qinhuangdao, Hebei 066004, China
  • Online:2018-07-15 Published:2018-08-06

摘要: 协同深度学习(Collaborative Deep Learning,CDL)利用神经网络极强的特征学习能力和模型拟合鲁棒性,解决了推荐系统在数据稀疏的情况下性能表现急剧下降的问题。但当推荐系统面临大量数据时,导致模型训练变得难以维护,进而出现多种不可预料的问题。为解决上述问题,对协同深度学习及其并行化方法进行了研究,提出了一种针对项目内容学习优化的改进模型协同深度推荐(CDL with item private node,CDL-i),通过对传统CDL中的自编码网络进行改进,增加私有网络节点,在模型的网络参数共享情况下,为每个项目添加私有偏置项,使网络能够更针对性地学习到项目内容参数,改进了模型在推荐系统中对项目内容的探测性能。同时对算法进行并行化改进,通过对模型进行拆分,提出一种并行训练CDL-i的方法,将其移植到Spark分布式集群上,并行地对模型各部分参数进行训练优化,增强模型所能处理数据的规模和扩展性。通过在多个真实数据集上的实验,验证了提出的并行深度推荐算法的有效性和高效性。

关键词: 深度学习, 推荐系统, 协同深度学习, Spark

Abstract: Due to the strong feature learning ability and robustness in model fitting, Collaborative Deep Learning(CDL) solves the problem that the performance falls sharply in the case of data sparsity in recommendation systems. However, recommendation system cannot deal with big data because of the difficulty to maintain the training of model, which can bring up many problems that hard to predict. To solve the problem mentioned above, this paper proposes a model called “CDL with item private node”(CDL-i) on the basis of collaborative depth learning and its parallelization method. The private network node is added to improve the self-coding network in traditional CDL. In the case of sharing network parameters of the model, private bias for each projects are added to make the network more target to the content of the project learning parameters, so as to improve the model of project content detection in the recommendation system performance. In addition, the algorithm is parallelized by splitting the model, and a parallel training method of CDL-i is also proposed. The parallelized model is applied in Spark distributed cluster, and the parameters of each part of the model are optimized in parallel to enhance the data scale and extensibility of the model. The effectiveness and efficiency of the proposed algorithm are verified by experiments on multiple real data sets.

Key words: deep learning, recommender system, Collaborative Deep Learning(CDL), Spark