Computer Engineering and Applications ›› 2008, Vol. 44 ›› Issue (19): 76-79.

• 研发、设计、测试 • Previous Articles     Next Articles

Data mining platform-WEKA and secondary development on WEKA

CHEN Hui-ping1,LIN Li-li1,WANG Jian-dong2,MIAO Xin-rui1   

  1. 1.Computer & Information Engineering College,Hohai University,Changzhou,Jiangsu 213022,China
    2.College of Information Science & Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 210016,China
  • Received:2007-09-03 Revised:2008-02-22 Online:2008-07-01 Published:2008-07-01
  • Contact: CHEN Hui-ping

WEKA数据挖掘平台及其二次开发

陈慧萍1,林莉莉1,王建东2,苗新蕊1   

  1. 1.河海大学 计算机信息工程学院,江苏 常州 213022
    2.南京航空航天大学 信息学院,南京 210016
  • 通讯作者: 陈慧萍

Abstract: The paper does some tests about data mining on WEKA which is an open source data mining tool,and analyzes the test results and indicates the problems of the WEKA system.In order to overcome the weakness of clustering in the WEKA system,the paper makes secondary development under the WEKA platform to extend the clustering algorithms.The paper introduces the process of embedding the k-medoids substitution method into the WEKA in which the classes and visualization functions of open source WEKA are fully utilized.The paper makes comparison between the embedded algorithm and initial algorithm.The k-medoids substitution method improves the accuracy on the traditional k-medoids method,preventing it from getting into partial optimal solution.Moreover,this method is insensitive to the initial points,with obtaining better clustering results.

Key words: data mining, WEKA platform, clustering, k-medoids substitution algorithm

摘要: 在开源数据挖掘平台WEKA上进行了挖掘测试和分析,并分析了其存在的主要问题。为了克服WEKA系统在聚类方面的薄弱性,在WEKA的开源环境下进行二次开发,扩充了聚类算法。介绍了将k-中心点轮换算法嵌入到WEKA平台的过程,充分利用了开源WEKA中的类和可视化功能,并对嵌入的算法和原有聚类算法进行了对比分析。该算法改进了传统的k-中心点算法,避免陷入局部最优,而且它对初始点不太敏感,可以获取更好的聚类效果。

关键词: 数据挖掘, WEKA平台, 聚类, k-中心点轮换算法