Computer Engineering and Applications ›› 2021, Vol. 57 ›› Issue (17): 96-105.DOI: 10.3778/j.issn.1002-8331.2008-0407

Previous Articles     Next Articles

Fast Nearest-Neighbor Searching Method for Collaborative Filtering

WANG Yong,ZHAO Xuhui,LI Xiaoguang,XIAO Ling   

  1. Key Laboratory of Electronic Commerce and Logistics of Chongqing, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
  • Online:2021-09-01 Published:2021-08-30

一种面向协同过滤的快速最近邻居搜索方法

王永,赵旭辉,李晓光,肖玲   

  1. 重庆邮电大学 电子商务与现代物流重点实验室,重庆 400065

Abstract:

To sovle the time-consuming problem of searching nearest neighbor in collaborative filtering and the problem that neighbor information are not effectively utilized in the prediction calculation, a method for fast searching nearest neighbors is proposed. The proposed method changes the mode of organizing data in the rating matrix and constructs two kinds of lists:the user rating list of items and the item rating list of users. According to these two lists, users or items that have an impact on the predictive rating values are filtered out. Then, the neighbor set of the target user or target item are determined. In the proposed method, unnecessary similarity calculations are eliminated, which improves computational efficiency. Moreover, the proposed method also effectively guarantees the neighbor utilization rate in prediction calculations and improves recommendation quality. The experimental results in the Movilens100k dataset and Movielens1M dataset show that the proposed method greatly improves the performance of collaborative filtering, such as running time, MAE, RMSE and F1 value. Therefore, the proposed method has good application value in the field of recommendation systems.

Key words: nearest-neighbor searching, collaborative filtering, recommendation algorithm, neighbor utilization, online recommendation

摘要:

针对协同过滤模型中寻找邻居集耗时,且部分邻居信息未能有效用于预测计算的问题,提出了一种快速搜寻最近邻居的方法。该方法改变了评分矩阵中数据组织方式,通过构建项目的用户评分列表和用户的项目评分列表,以此来筛选出对预测评分值产生影响的用户或项目,进而得到目标用户或项目的邻居集。该方法排除了不必要的相似性计算,提高了运算效率;而且还有效保证了预测计算中的邻居利用率,提高了推荐质量。在Movielens100k与Movielens1M两个数据集上的实验结果表明,所提出算法在运行时间、MAE、RMSE、F1值四个指标上均有较大提升。因此该算法在推荐系统领域具有良好的应用价值。

关键词: 最近邻居搜索, 协同过滤, 推荐算法, 邻居利用率, 线上推荐