计算机工程与应用 ›› 2018, Vol. 54 ›› Issue (16): 152-157.DOI: 10.3778/j.issn.1002-8331.1712-0089

• 模式识别与人工智能 • 上一篇    下一篇

基于聚类和随机森林的协同过滤推荐算法

杨兴雨,李华平,张宇波   

  1. 广东工业大学 管理学院,广州 510520
  • 出版日期:2018-08-15 发布日期:2018-08-09

Collaborative filtering algorithm based on clustering and random forests

YANG Xingyu, LI Huaping, ZHANG Yubo   

  1. School of Management, Guangdong University of Technology, Guangzhou 510520, China
  • Online:2018-08-15 Published:2018-08-09

摘要: 针对基于邻近关系的协同过滤算法在线推荐效率低的问题,提出了一种可离线训练评分预测模型的算法。通过聚类算法降低用户-项目评分矩阵中用户向量和项目向量的维数,并对数据进行转换使其适用于监督模型;利用转换后的数据离线训练随机森林模型,在线推荐时只需根据随机森林模型的规则进行评分预测,无需查找最邻近用户或项目。实验结果表明,该算法在不降低评分预测精度的情况下,在线推荐效率远高于基于邻近关系的协同过滤算法。

关键词: 协同过滤, 推荐算法, 聚类, 随机森林

Abstract: To handle the inefficiency problem of online recommendation of neighborhood-based collaborative filtering algorithms, this paper proposes a method to train a rating prediction model offline. The method firstly reduces the dimensions of the user vectors and the item vectors in the user-item rating matrix, and transforms this matrix so as to use supervised learning models. A random forest model is then trained by using the transformed data, and the online rating prediction is made by the previous trained model without the search of the nearest neighborhoods. The experiment results show that the method performs much better than neighborhood-based collaborative filtering algorithms in term of online recommendation efficiency without decreasing the precision of rating prediction.

Key words: collaborative filtering, recommendation algorithm, clustering, random forests