计算机工程与应用 ›› 2011, Vol. 47 ›› Issue (6): 208-211.

• 图形、图像、模式识别 • 上一篇    下一篇

一种新的无监督连续属性离散化方法

花海洋,赵怀慈   

  1. 中国科学院 沈阳自动化研究所,沈阳 110016
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2011-02-21 发布日期:2011-02-21

New discretization method for numerical attributes based on clustering and merging

HUA Haiyang,ZHAO Huaici   

  1. Shenyang Institute of Automation,Chinese Academy of Sciences,Shenyang 110016,China
  • Received:1900-01-01 Revised:1900-01-01 Online:2011-02-21 Published:2011-02-21

摘要: 提出了一种基于聚类方法的无监督连续属性离散化算法,称为CAMNA(Clustering and Merging on Numerical Attribute)算法。CAMNA算法通过聚类过程将数值值域划分为多个离散区间,根据类分布的指导信息优化合并相邻区间,实现理想的离散方案。通过实验证明该算法在保持执行效率较高的前提下,离散结果更加合理,生成的决策树结构简单,获得较少的分类规则,分类准确率也有提高。

关键词: 决策树, 数值型属性, 聚类区间, 分类

Abstract: This paper proposes such an algorithm,called CAMNA(Clustering and Merging on Numerical Attributes),which is a new algorithm of unsupervised discretization based on clustering.The method divides a set of the numerical attribute values into many intervals based on clustering in the first step.Then in the second step,the cluster quality is optimized by computing the class label of the adjacent intervals.This procedure can not stop until a satisfactory discretization schema is reached.Experimental evaluation of several discretization algorithms show that the proposed algorithm is more efficient and can generate a better discretization schema.Comparing the output of C4.5,resulting tree is smaller,less classification rules,and high accuracy of classification.

Key words: decision tree, numerical attributes, clustering intervals, classification