计算机工程与应用 ›› 2015, Vol. 51 ›› Issue (3): 112-116.

• 数据库、数据挖掘、机器学习 • 上一篇    下一篇

挖掘不确定频繁子图的改进算法的研究

胡  健1,何林波2,毛伊敏1,杨  健2   

  1. 1.江西理工大学 应用科学学院,江西 赣州 341000
    2.江西理工大学 信息工程学院,江西 赣州 341000
  • 出版日期:2015-02-01 发布日期:2015-01-28

Research of improved mining frequent subgraph patterns in uncertain graph databases

HU Jian1, HE Linbo2, MAO Yimin1, YANG Jian2   

  1. 1.Applied Science Institute of Jiangxi University of Science and Technology, Ganzhou, Jiangxi 341000, China
    2.School of Information Engineering, Jiangxi University of Science and Technology, Ganzhou, Jiangxi 341000, China
  • Online:2015-02-01 Published:2015-01-28

摘要: 鉴于图结构能简单方便地描绘复杂的数据以及实际应用中图数据的获得具有不确定性,不确定频繁子图挖掘算法得到广泛的研究。目前一个典型的图挖掘算法是MUSE,但MUSE算法存在期望支持度计算消耗大、时间效率不够高等问题。针对此问题提出了一种基于划分思想混合搜索策略的不确定子图挖掘算法EDFS,它用改进过的GSpan算法进行不确定的子图数据预处理,用裁剪子图模式的搜索空间裁剪不确定子图数据,用基于划分思想的混合策略进行频繁子图的挖掘。子图同构与边存在概率的实验结果证明了EDFS算法能更高效地挖掘出不确定数据频繁子图。

关键词: 不确定图, 图挖掘, 频繁子图集, 划分思想, 混合策略

Abstract: Mining frequent subgraph patterns in graph databases is a popular and important problem which has wide application in lots of domains. At the moment, the MUSE is a typical algorithm for graph mining, but its expected supports computation costs a lot and the time efficiency is so low. So this paper proposes a method that combines classification thought with BFS thought to find frequent subgraph patterns(EDFS). It uses improved GSpan algorithm to deal with uncertain graphs to reduce the space of subgraph patterns, then integrates classification thought with BFS thought to mine frequent subgraph patterns. The subgraph isomorphism tests and the edge whether existing tests indicate that EDFS is more efficient than MUSE.

Key words: uncertain graph, graph mining, frequent subgraph patterns, classification thought, mixed algorithm