计算机工程与应用 ›› 2012, Vol. 48 ›› Issue (6): 134-138.

• 数据库、信号与信息处理 • 上一篇    下一篇

Internet环境下并行群组数据挖掘模型

马冰川,赵书良,王 伟   

  1. 河北师范大学 数学与信息科学学院 河北省计算数学与应用重点实验室,石家庄 050016
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2012-02-21 发布日期:2012-02-21

Parallel group data mining model in Internet environment

MA Bingchuan, ZHAO Shuliang, WANG Wei   

  1. Hebei Province Key Laboratory of Computational Mathematics and Applications, Mathematics and Information Science College, Hebei Normal University, Shijiazhuang 050016, China
  • Received:1900-01-01 Revised:1900-01-01 Online:2012-02-21 Published:2012-02-21

摘要: 随着Internet技术的发展,分布式数据挖掘越来越受到重视。分布式数据挖掘急需一种能聚合多种网络功能为通信媒介,松耦合、并行的数据挖掘架构。以分析经典并行数据挖掘模型PADMA和BODHI为基础,结合现实需要给出了一种新的并行分布式数据挖掘模型——PADMAN。模型采用分治策略,将数据挖掘任务进行划分并分配给数据挖掘组,群组之间并行挖掘;基于Agent,使各基本数据挖掘单元具有自治性;群组客户端和全局客户端可实现无线接入,使用户端的使用和接入更加灵活。分治策略的应用,使模型具有良好的模块化和可扩展性。

关键词: 数据挖掘, 分治策略, Agent, 分布式数据挖掘

Abstract: Along with the development of Internet, distributed data mining is receiving more and more attention. Distributed data mining needs a kind of loose coupling and parallel data mining framework, which can congregate multiple network functions as communication media. Based on the analysis of the classic parallel data mining models PADMA and BODHI, this paper proposes a new parallel distributed data mining model—PADMAN. Divide-and-conquer strategy being used in the model, in which data mining tasks are partitioned and distributed to data mining groups, and different groups process data mining tasks in parallel. Owing to based on agent of this model, all basic data mining units are autonomous. Even more, both group clients and global clients can be connected by wireless network which increases the flexibility for users using or accessing the system. The application of divide-and-conquer strategy equips the model with much better modularization and scalability.

Key words: data mining, divide-and-conquer strategy, Agent, distributed data mining