计算机工程与应用 ›› 2016, Vol. 52 ›› Issue (12): 95-100.

• 大数据与云计算 • 上一篇    下一篇

一种改进Minhash的分布式协同过滤推荐算法

吴博文,陈  曦   

  1. 长沙理工大学 计算机与通信工程学院,长沙 410076
  • 出版日期:2016-06-15 发布日期:2016-06-14

Collaborative filtering recommendation based on improved Minhash algorithm

WU Bowen, CHEN Xi   

  1. College of Computer and Communication Engineering, Changsha University of Science and Technology, Changsha 410076, China
  • Online:2016-06-15 Published:2016-06-14

摘要: 协同过滤推荐算法通过研究用户的喜好,实现从海量数据资源中为用户推荐其感兴趣的内容。衡量用户(资源)的相似性是协同过滤算法的核心内容,在数据量大的系统中,用户(资源)的相似性度量会面临准确性和计算复杂性等问题,影响到推荐效果。提出一种改进的协同过滤推荐算法,提取用户兴趣偏好的多值信息,运用改进Minhash算法度量用户相似性,并结合Mapreduce分布式计算,合理、高效地产生用户邻居,实现对用户的评分推荐。实验结果表明:改进算法能有效改善大数据集的推荐准确性并提高推荐效率,降低了推荐耗时。

关键词: 协同过滤, 兴趣偏好, 相似度计算, 分布式计算

Abstract: Collaborative filtering recommendation algorithm recommends interesting content for users from a massive data resource, by studying the user’s preferences. Measuring similarity of user (resource) is the core of collaborative filtering algorithms. In the large volume of data systems, the accuracy and computational complexity are faced in similarity measuring, which thus affect the recommendation results. This paper proposes an improved collaborative filtering algorithm by extracting multi-valued information of user interest preferences, uses improved Minhash algorithm to measure user similarity, and combines with Mapreduce distributed computing, to generate neighbor rationally and effectively, and finishes user ratings recommendations. Experimental results show that the improved algorithm can improve the recommendation accuracy and efficiency, reduce the recommended time-consuming for large data sets.

Key words: collaborative filtering, interest preferences, similarity calculation, distributed computing