计算机工程与应用 ›› 2017, Vol. 53 ›› Issue (21): 62-67.DOI: 10.3778/j.issn.1002-8331.1605-0227

• 大数据与云计算 • 上一篇    下一篇

LSHBMRPK-means算法及其应用

罗  俊,李劲华   

  1. 青岛大学 数据科学与软件工程学院,山东 青岛  266071
  • 出版日期:2017-11-01 发布日期:2017-11-15

LSHBMRPK-means algorithm and its application. Computer Engineering and Applications

LUO Jun, LI Jinhua   

  1. College of Data Science and Software Engineering, Qingdao University, Qingdao, Shandong 266071, China
  • Online:2017-11-01 Published:2017-11-15

摘要: 针对传统的k-means聚类算法在处理大数据时算法时间复杂度极高和聚类效果不佳的问题,提出了LSHBMRPK-means算法,即基于局部敏感哈希函数的MapReduce并行化的k-means聚类算法;针对推荐系统的可扩展性问题,将LSHBMRPK-means应用于基于聚类的协同过滤算法。此外,针对评分数据的稀疏性问题,使用LFM,即隐语义模型,对缺失值进行填充,进而提出了基于LFM的LSHBMRPK-means聚类算法。实验结果表明,LSHBMRPK-means聚类算法提高了聚类效率和质量,基于LFM的LSHBMRPK-means协同过滤算法具有较好的可扩展性,同时解决了因评分数据稀疏导致聚类质量不好的问题。

关键词: 大数据, k-means, 局部敏感哈希函数, MapReduce, 推荐算法

Abstract: To deal with the problems that time complexity is extremely high and the result of clustering ispoor when basic k-means algorithm is used to handle on big data issues, the paper proposes LSHBMRPK-means algorithm, locality sensitive hashing-based MapReduce parallelized k-means algorithm. Due to the scalability problem of recommendation system, the paper applies LSHBMRPK-means algorithm to cluster-based collaborative filtering algorithm. In addition, to handle on the issue of sparsity in the rating dataset, the paper uses the method of LFM to fill in the sparse rating dataset, and proposes LFM-based LSHBMRPK-means collaborative filtering algorithm. Primary experiments show that LSHBMRPK-means can improve the efficiency and quality of clustering, the proposed algorithm combined with the filtering algorithm has a good scalability, and at the same time it has solved the problem of poor clustering quality caused by the sparse rating dataset.

Key words: big data, k-means, locality sensitive hashing function, MapReduce, recommendation algorithm