计算机工程与应用 ›› 2019, Vol. 55 ›› Issue (1): 84-88.DOI: 10.3778/j.issn.1002-8331.1709-0071

• 大数据与云计算 • 上一篇    下一篇

ML-kNN算法在大数据集上的高效应用

陆  凯,徐  华   

  1. 江南大学 物联网工程学院,江苏 无锡 214122
  • 出版日期:2019-01-01 发布日期:2019-01-07

Efficient ML-kNN Algorithm on Large Data Set

LU Kai, XU Hua   

  1. School of Internet of Things Engineering, Jiangnan University, Wuxi, Jiangsu 214122, China
  • Online:2019-01-01 Published:2019-01-07

摘要: k近邻多标签算法(ML-kNN)是一种懒惰学习算法,并已经成功地应用到实际生活中。随着信息量的不断增大,将ML-kNN算法运用到大数据集上已是形势所需。利用聚类算法将数据集分为几个不同的部分,然后在每一个部分中使用ML-kNN算法,并在四个规模不同的数据集上进行了一系列实验。实验结果表明,基于此思想的ML-kNN算法不论在精度、性能还是效率上都略胜一筹。

关键词: 多标签分类, ML-kNN算法, 聚类, 大数据集

Abstract: Multi-label k Nearest Neighbor(ML-kNN) algorithm is a lazy learning approach and has successfully been developed in real application. With the increasing amount of information, it is necessary for the ML-kNN algorithm to be applied to large data sets. This paper firstly conducts clustering algorithm to separate the dataset into several parts, and then, each of which conducts ML-kNN classification. And a series of experiments are carried out on four different datasets. The experimental results show that ML-kNN algorithm proposed works well in terms of accuracy and efficient.

Key words: multi-label classification, Multi-Label k Nearest Neighbor(ML-kNN), cluster, big data set