计算机工程与应用 ›› 2011, Vol. 47 ›› Issue (24): 121-124.

• 数据库、信号与信息处理 • 上一篇    下一篇

利用编码的频繁导出式子树挖掘算法

尹四清1,孔鹏程2,张素兰2   

  1. 1.中北大学 软件学院,太原 030051
    2.太原科技大学 计算机科学与技术学院,太原 030024
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2011-08-21 发布日期:2011-08-21

Frequent induced subtree mining algorithm using encoding

YIN Siqing1,KONG Pengcheng2,ZHANG Sulan2   

  1. 1.School of Software,North University of China,Taiyuan 030051,China
    2.School of Computer Science and Technology,Taiyuan University of Science and Technology,Taiyuan 030024,China

  • Received:1900-01-01 Revised:1900-01-01 Online:2011-08-21 Published:2011-08-21

摘要: 针对频繁导出式子树的特点,给出一种基于编码的频繁导出式子树挖掘算法。该算法通过宽度优先编码来表示原始数据库,使单个投影的规模最小;通过对每个投影编码降低了整个投影库的规模,从而有效地提高了频繁导出式子树的挖掘效率。实验结果验证了该算法具有较高的挖掘效率。

关键词: 数据挖掘, 频繁导出式子树, 投影库, 编码

Abstract: According to the characteristics of frequent induced sub-tree,a mining algorithm based on encoding,called EFITM algorithm,is presented.Width-first encoding is used to express the initial database,which minimizes the encoding size of every single projection in the project database.The intervals with encoding are used to denote the project database of the node on the right-most path of the subtree,and the size of the whole project database is decreased.Experimental results show the correctness and the validity of the EFITM algorithm.

Key words: data mining, frequent induced subtree, project database, encoding