Computer Engineering and Applications ›› 2008, Vol. 44 ›› Issue (15): 148-150.

• 数据库、信号与信息处理 • Previous Articles     Next Articles

Improved decision tree algorithm based on database query

YANG Yi-zhan,LI Xiao-ping,DUAN Xia-xia   

  1. School of Electronical & Mechanical Engineering,Xidian University,Xi’an 710071,China
  • Received:2007-08-29 Revised:2007-10-29 Online:2008-05-21 Published:2008-05-21
  • Contact: YANG Yi-zhan

一种基于数据库查询的改进的决策树算法

杨一展,李小平,段霞霞   

  1. 西安电子科技大学 机电工程学院,西安 710071
  • 通讯作者: 杨一展

Abstract: As the core algorithm of classification in data mining,ID3 is famous for the merits of easy construction,strong learning ability and high classifying speed.But inherited from machine learning,it has a poor integration with database and can only process data of small scale,which affects its practicality.So an improvement to its core section is proposed based on the inherent ID3 algorithm.Using the embedded SQL,it directly queries the database and then processes received data,then finally acquires a decision table of classification.It is proved that with the improved ID3 combining the high efficiency of SQL and the flexibility of C language,the highly efficient and seamless classification of large data is achieved,which is also greatly improved the processing efficiency.

Key words: data mining, decision tree, ID3, embedded SQL, classification

摘要: ID3算法作为数据挖掘分类技术中的核心算法,有着构造简单、学习能力强、分类速度快等优点。但由于其沿用的是机器学习算法,处理的数据集规模小且与数据库集成性较差,影响了其实用性。在继承原有算法思路的基础上,对其核心部分进行了改进,通过使用嵌入式SQL,直接对目标数据库进行查询操作并处理,最终得到分类决策表并保存于数据库。实验证明,改进的ID3算法结合了SQL的高效性和C语言的灵活性,高效无缝地实现了大量数据的分类且大大提高了算法的执行效率。

关键词: 数据挖掘, 决策树, ID3, 嵌入式SQL, 分类