计算机工程与应用 ›› 2019, Vol. 55 ›› Issue (1): 64-69.DOI: 10.3778/j.issn.1002-8331.1710-0160

• 大数据与云计算 • 上一篇    下一篇

基于数据流和点对点网络的分布式推荐算法

丛义昊,于艳华   

  1. 北京邮电大学 计算机学院,北京 100876
  • 出版日期:2019-01-01 发布日期:2019-01-07

Online Distributed Recommendation Algorithm Based on Data Stream and Peer-to-Peer Network

CONG Yihao, YU Yanhua   

  1. School of Computer Science, Beijing University of Posts and Telecommunications, Beijing 100876, China
  • Online:2019-01-01 Published:2019-01-07

摘要: 推荐算法是数据挖掘中应用最广泛的算法之一,目前的推荐算法主要是针对静态数据的,缺乏对动态数据的适应性,基于数据流的推荐算法是解决这一问题的方法。针对目前在分布式平台中采用参数服务器控制模型训练存在的滞后梯度和掉队者问题,提出了一种新的使用点对点参数交换网络代替参数服务器的方法,并在训练过程中引入遗忘策略和异常评分检测能力。在新的分布式流计算框架Flink上进行设计实现,并在经典的MovieLens-1m数据集上进行了实验。实验结果表明,该算法能够在保证推荐准确率的同时,降低一半通讯开销。

关键词: 在线矩阵分解, 流计算, 分布式协同过滤, 点对点网络

Abstract: Recommendation algorithm is one of the most widely used algorithms in data mining. However, recent studies focus on static data and lack the adaptability to dynamic data. Recommendation algorithm based on data stream is the solution to this problem. Aiming at the straggler and delayed-gradient problems in using parameter server to control model training in distributed platform, a new method of using peer-to-peer parameter exchange network is proposed, and the forgetting strategy and anomaly detection ability are introduced in the training process. Algorithm is implemented on Flink and experiments on Movielens-1m. Experimental results show that the algorithm can reduce the communication cost by half, while ensuring the accuracy of recommendation.

Key words: online matrix factorization, stream computing, distributed collaborative filtering, peer-to-peer network