计算机工程与应用 ›› 2015, Vol. 51 ›› Issue (21): 6-11.

• 博士论坛 • 上一篇    下一篇

基于聚类数的评分矩阵恢复算法

刘  波1,2,3,何希平1,2,3   

  1. 1.重庆工商大学 重庆市检测控制集成系统工程实验室,重庆 400067
    2.重庆工商大学 电子商务及供应链系统重庆市重点实验室,重庆 400067
    3.重庆工商大学 计算机科学与信息工程学院,重庆 400067
  • 出版日期:2015-11-01 发布日期:2015-11-16

Rating matrix completion algorithm based on number of clusters

LIU Bo1,2,3, HE Xiping1,2,3   

  1. 1.Chongqing Engineering Laboratory for Detection, Control and Integrated System, Chongqing Technology and Business University, Chongqing 400067, China
    2.Chongqing Key Laboratory of Electronic Commerce & Supply Chain System CTBU, Chongqing Technology and Business University, Chongqing 400067, China
    3.School of Computer Science and Information Engineering, Chongqing Technology and Business University, Chongqing 400067, China
  • Online:2015-11-01 Published:2015-11-16

摘要: 评分矩阵(rating matrix)的特点是高维、稀疏、低秩,对其研究的主要方法是低秩矩阵恢复。对这些算法而言,不同评分矩阵的秩,会得到不同的恢复精度。但目前没有理论来研究评分矩阵秩的估计,从而影响了这些算法的应用。从理论上分析了用户聚类数与评分矩阵秩的关系,给出用户聚类数的计算方法,并在此基础上提出一种基于聚类数的秩1矩阵恢复(Clusters Number Rank-1 Matrix Completion,CN-R1MC)算法来恢复评分矩阵。通过在多个推荐系统数据集上的实验证明:用户聚类数能较好地近似评分矩阵的秩,这对提高评分矩阵的恢复精度有重要的作用。所提出的算法有较好的应用价值。

关键词: 评分矩阵, 低秩矩阵恢复, 秩1矩阵, 用户聚类数, 奇异值分解

Abstract: Rating matrix is high-dimensional, sparse and low rank. The low rank matrix recovery is the important method for   rating matrix of research. For these algorithms, different scoring matrix rank will obtain different recovery precision. But there is no theory to study the score matrix rank, thus affecting the application of these algorithms. This paper analyzes the relationship between clustering number of user and rank of rating matrix, and then it presents the method of computing the cluster number of user, and on this basis, it proposes a number of clusters based on rank 1 matrix recovery(Clusters Number Rank-1 Matrix Completion, CN-R1MC) algorithm to recover rating matrix. Through a plurality of recommendation system data sets on the experiments, the cluster number of user can approximate rank of rating matrix better, which has an important role in improving recovery accuracy for the rating matrix. The proposed algorithm has good application value.

Key words: rating matrix, low-rank matrix completion, rank-one matrix, number of user clustering, singular value decomposition