计算机工程与应用 ›› 2014, Vol. 50 ›› Issue (15): 101-106.

• 数据库、数据挖掘、机器学习 • 上一篇    下一篇

大数据下不完备信息系统近似空间的并行算法

姜  麟,米允龙,王  添   

  1. 昆明理工大学 理学院,昆明 650500
  • 出版日期:2014-08-01 发布日期:2014-08-04

Parallel algorithm for computing incomplete information systems under big data

JIANG Lin, MI Yunlong, WANG Tian   

  1. Faculty of Science, Kunming University of Science and Technology, Kunming 650500, China
  • Online:2014-08-01 Published:2014-08-04

摘要: 上、下近似空间是粗糙理论的重要概念,解决上、下近似问题是海量数据挖掘的基础。经典的近似空间算法不适合处理海量数据,更不适合处理带缺失信息的海量数据问题。为此,通过深度分析带缺失信息的海量数据特征,结合MapReduce编程模型,提出了基于MapReduce框架下近似空间的并行算法,以处理带缺失信息的海量数据,实验结果表明了该并行算法的有效性。

关键词: MapReduce, 数据挖掘, 海量数据, 粗糙集, 不完备信息系统, 近似空间

Abstract: The lower and upper approximations are important concepts in rough set theory. Therefore, the computation of approximations is the basic for improving the massive data mining performance. Classical approximation space algorithm is infeasible for massive data, much less for massive data with missing information. To this end, through deep analysis of the characteristics of massive data with missing information, combining with the MapReduce programming model, a parallel algorithm for computing incomplete information systems using MapReduce is put forward to deal with the massive data with missing information. The experimental results demonstrate that the proposed parallel algorithm is effective.

Key words: MapReduce, data mining, massive data, rough set, incomplete information system, approximations