Data mining platform-WEKA and secondary development on WEKA

Computer Engineering and Applications ›› 2008, Vol. 44 ›› Issue (19): 76-79.

• 研发、设计、测试 • Previous Articles Next Articles

Data mining platform-WEKA and secondary development on WEKA

CHEN Hui-ping¹,LIN Li-li¹,WANG Jian-dong²,MIAO Xin-rui¹

1.Computer & Information Engineering College，Hohai University，Changzhou，Jiangsu 213022，China
2.College of Information Science & Technology，Nanjing University of　Aeronautics　and　Astronautics，Nanjing　210016，China

Received:2007-09-03 Revised:2008-02-22 Online:2008-07-01 Published:2008-07-01
Contact: CHEN Hui-ping

WEKA数据挖掘平台及其二次开发

陈慧萍¹,林莉莉¹,王建东²,苗新蕊¹

1.河海大学计算机信息工程学院，江苏常州 213022
2.南京航空航天大学信息学院，南京 210016

通讯作者: 陈慧萍

Abstract

Abstract: The paper does some tests about data mining on WEKA which is an open source data mining tool，and analyzes the test results and indicates the problems of the WEKA system.In order to overcome the weakness of clustering in the WEKA system，the paper makes secondary development under the WEKA platform to extend the clustering algorithms.The paper introduces the process of embedding the k-medoids substitution method into the WEKA in which the classes and visualization functions of open source WEKA are fully utilized.The paper makes comparison between the embedded algorithm and initial algorithm.The k-medoids substitution method improves the accuracy on the traditional k-medoids method，preventing it from getting into partial optimal solution.Moreover，this method is insensitive to the initial points，with obtaining better clustering results.

Key words: data mining, WEKA platform, clustering, k-medoids substitution algorithm

摘要： 在开源数据挖掘平台WEKA上进行了挖掘测试和分析，并分析了其存在的主要问题。为了克服WEKA系统在聚类方面的薄弱性，在WEKA的开源环境下进行二次开发，扩充了聚类算法。介绍了将k-中心点轮换算法嵌入到WEKA平台的过程，充分利用了开源WEKA中的类和可视化功能，并对嵌入的算法和原有聚类算法进行了对比分析。该算法改进了传统的k-中心点算法，避免陷入局部最优，而且它对初始点不太敏感，可以获取更好的聚类效果。

关键词: 数据挖掘, WEKA平台, 聚类, k-中心点轮换算法

CHEN Hui-ping¹,LIN Li-li¹,WANG Jian-dong²,MIAO Xin-rui¹. Data mining platform-WEKA and secondary development on WEKA[J]. Computer Engineering and Applications, 2008, 44(19): 76-79.

陈慧萍¹,林莉莉¹,王建东²,苗新蕊¹. WEKA数据挖掘平台及其二次开发[J]. 计算机工程与应用, 2008, 44(19): 76-79.

[1]	LAN Hong, HUANG Min. Fusion of KNN Optimized Density Peaks and FCM Clustering Algorithm [J]. Computer Engineering and Applications, 2021, 57(9): 81-88.
[2]	GUO Xiaojing, SUI Haoda. Application of Improved YOLOv3 in Foreign Object Debris Target Detection on Airfield Pavement [J]. Computer Engineering and Applications, 2021, 57(8): 249-255.
[3]	LI Li, JI Xinyuan, SONG Song. Prediction Model for Number of Software Defects in Loop [J]. Computer Engineering and Applications, 2021, 57(7): 158-163.
[4]	HUO Guangyu, ZHANG Yong, SUN Yanfeng, YIN Baocai. Research on Archive Data Intelligent Classification Based on Semantic [J]. Computer Engineering and Applications, 2021, 57(6): 247-253.
[5]	YANG Fang, YIN Xi, SI Jianhui, LIU Hongyuan, WANG Xue. Mathematical Expression Similarity Calculation Method Based on Focus Clustering [J]. Computer Engineering and Applications, 2021, 57(6): 88-93.
[6]	ZONG Xiaoping, TAO Zeze. Knowledge Tracing Model Based on Mastery Speed [J]. Computer Engineering and Applications, 2021, 57(6): 117-123.
[7]	ZHAO Fan, ZHANG Lin, WEN Zhiquan, YANG Linlin, LIN Guangfeng. Direct and Efficient Natural Scene Chinese Character Approaching Spotting Method [J]. Computer Engineering and Applications, 2021, 57(6): 159-167.
[8]	PENG Qihui, XUAN Shibin, GAO Qing. Distribution Automatic Threshold Density Peak Clustering Algorithm [J]. Computer Engineering and Applications, 2021, 57(5): 71-78.
[9]	LI Yongzhen, LIAO Husheng. Multi-view Clustering via Graph Convolutional Neural Network [J]. Computer Engineering and Applications, 2021, 57(5): 115-122.
[10]	WANG Changlong, ZHANG Yuandong, MIAO Hong, YANG Yuheng. Application of Double Channel Convolutional Neural Network in Pumpkin Diseases Identification [J]. Computer Engineering and Applications, 2021, 57(5): 183-189.
[11]	HU Xiaomin, WANG Mingfeng, ZHANG Shourong, LI Min. New Differential Evolution with Particle Swarm Optimization Algorithm for Text Clustering [J]. Computer Engineering and Applications, 2021, 57(4): 61-67.
[12]	WANG Junling, LU Xinming. Video Key Frame Extraction Algorithm Based on Semantic Correlation [J]. Computer Engineering and Applications, 2021, 57(4): 192-198.
[13]	GAO Tianyu, WANG Qingrong, YANG Lei. Data Mining Model Based on Attribute Dependability Enhancement of Rough Set [J]. Computer Engineering and Applications, 2021, 57(3): 87-93.
[14]	WANG Fuyin, ZHANG Desheng, ZHANG Xiao. Adaptive Density Peaks Clustering Algorithm Combining with Whale Optimization Algorithm [J]. Computer Engineering and Applications, 2021, 57(3): 94-102.
[15]	CHEN Junfeng, ZHENG Zhongtuan. Over-Sampling Method on Imbalanced Data Based on WKMeans and SMOTE [J]. Computer Engineering and Applications, 2021, 57(23): 106-112.

Data mining platform-WEKA and secondary development on WEKA

WEKA数据挖掘平台及其二次开发

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics