计算机工程与应用 ›› 2015, Vol. 51 ›› Issue (15): 124-128.

• 数据库、数据挖掘、机器学习 • 上一篇    下一篇

基于Hadoop分布式改进聚类协同过滤推荐算法研究

孙天昊,黎安能,李  明,朱庆生   

  1. 重庆大学 计算机学院,重庆 400044
  • 出版日期:2015-08-01 发布日期:2015-08-14

Study on distributed improved clustering collaborative filtering algorithm based on Hadoop

SUN Tianhao, LI Anneng, LI Ming, ZHU Qingsheng   

  1. College of Computer Science, Chongqing University, Chongqing 400044, China
  • Online:2015-08-01 Published:2015-08-14

摘要: 为了改善协同过滤推荐算法在大数据下的稀疏性和可扩展性问题,提出一种基于Hadoop平台的分布式改进聚类协同过滤推荐算法。在分布式平台下,离线对高维稀疏数据采用矩阵分解算法预处理,改善数据稀疏性后通过改进项目聚类算法构建聚类模型,根据聚类模型和相似性计算形成推荐候选空间,在线完成推荐。实验验证该算法能够有效改善推荐系统的推荐质量并大大提高推荐效率,同时在云环境中具有良好可扩展性。

关键词: 协同过滤, Hadoop, 矩阵分解, 聚类, 分布式计算

Abstract: In order to improve the data sparsity and scalability of collaborative filtering recommendation algorithms in big data, integrating matrix factorization with distributed computing, this paper proposes a distributed improved clustering collaborative filtering algorithm based on Hadoop. It uses ALS matrix factorization algorithm to fill sparse data offline. Filled matrix is clustered by improved item clustering algorithm. Then based on the clusters and similarities it creates the candidate set of recommendation. Recommendations are accomplished online. Experimental results show that the proposed algorithm can not only efficiently improve the quality of recommendation system, but also has good scalability in clouds.

Key words: collaborative filtering, Hadoop, matrix factorization, clustering, distributed computing