Computer Engineering and Applications ›› 2009, Vol. 45 ›› Issue (20): 31-34.DOI: 10.3778/j.issn.1002-8331.2009.20.009

• 博士论坛 • Previous Articles     Next Articles

Research on algorithm for mining weighted maximal frequent subgraphs

WANG Ying-long1,YANG Jun2,ZHOU Fa-guo1,TANG Jian-jun2   

  1. 1.School of Software,Jiangxi Agriculture University,Nanchang 330045,China
    2.College of Computer and Information Engineering,Jiangxi Agriculture University,Nanchang 330045,China
  • Received:1900-01-01 Revised:2009-05-11 Online:2009-07-11 Published:2009-07-11
  • Contact: WANG Ying-long

加权最大频繁子图挖掘算法的研究

王映龙1,杨 珺2,周法国1,唐建军2   

  1. 1.江西农业大学 软件学院,南昌 330045
    2.江西农业大学 计算机与信息工程学院,南昌 330045
  • 通讯作者: 王映龙

Abstract: Frequent subgraph mining is an active research topic in the data mining community.Another problem is previous frequent subgraph mining algorithms treat graphs uniformly while graphs have different importance actually.To solve the above two problems,authors propose a new depth-first algorithm to discover weighted maximal frequent subgraphs only.Firstly,to lower the complexity of computing canonical code of adjacency matrix of graph,a new vertex sorting strategy is introduced.Meanwhile,the sorting strategy can also speed the matching process of canonical codes.Secondly,weighted maximal frequent subgraph is defined,which can not only discover important maximal subgraph,but also inherit the property of anti-monotony.Thus,the speed of pruning is quickened.Experimental results show the proposed algorithm can not only reduce the number of discovered graph patterns,but also is efficient.

Key words: data mining, weighted maximal frequent subgraph, adjacency matrix, canonical code

摘要: 如何从大量的图中挖掘出令人感兴趣的子图模式已经成为数据挖掘领域研究的热点之一。传统的频繁子图挖掘方法对满足最小支持度阈值的子图同等对待,但在真实数据库中不同的子图往往具有不同的重要程度。为解决上述问题,提出了一种深度优先的挖掘加权最大频繁子图的新算法。首先给出了一种新的用于计算图的邻接矩阵规范编码的结点排序策略,大大降低了求图规范编码的复杂度,并可以加速子图规范编码匹配的速度。其次,给出了加权最大频繁子图的定义,不仅可以找出较为重要的最大频繁子图,而且可以使挖掘结果同样具有反单调性,从而可加速剪枝。实验结果表明,提出的算法不仅可以有效地减少挖掘结果的数量,而且具有较高的效率。

关键词: 数据挖掘, 最大加权频繁子图, 邻接矩阵, 规范编码