Computer Engineering and Applications ›› 2015, Vol. 51 ›› Issue (9): 135-141.

Previous Articles     Next Articles

Parallel mining on label-constraint proximity pattern

ZHENG Haiyan1,2, WANG Yuanfang2, XIONG Zheng1, LI Kunming1, CHONG Zhihong2, YIN Fei1   

  1. 1. Smart Grid Product Center, Jiangsu Frontier Electric Technology Co. Ltd, Nanjing 211189, China
    2.School of Computer Science and Engineering, Southeast University, Nanjing 211189, China
  • Online:2015-05-01 Published:2015-05-15

标签集约束近似频繁模式的并行挖掘

郑海雁1,2,王远方2,熊  政1,李昆明1,崇志宏2,尹  飞1   

  1. 1.江苏方天电力技术有限公司 智能电网产品中心,南京 211189
    2.东南大学 计算机科学与工程学院,南京 211189

Abstract: Proximity pattern is derived from frequent pattern, characterized by a combination of frequent items and frequent subgraphs. Research about proximity pattern is mainly concentrated on the unlabeled graph, and the main  application scenarios are social network, semantic Web and smart grid, etc. Proximity pattern mining process involves both frequent items mining and frequent subgraph mining, therefore the existing methods of pattern mining can not be used directly on the issue. On the basis of the proximity pattern, this paper introduces the LCPP(Label-Constraint Proximity Pattern) algorithm during the label graph. The algorithm is deployed in the MapReduce parallel computing model, making up for the inefficiency of pFP algorithm when processing the large-scale database. The experimental results show that the parallel algorithm can not only improve the computing speed, but also has good scalability, and the LCPP algorithm is an excellent complement of pFP.

Key words: proximity, label-constraint, parallel

摘要: 近似频繁模式衍生于频繁模式,综合了频繁项集与频繁子图的特点。针对该模式的研究集中在无标签图上,其应用场景主要为社交网络、语义网络、智能电网等。近似频繁模式挖掘过程同时涉及频繁项集挖掘和频繁子图挖掘,因此已有的处理频繁模式挖掘算法无法较好地解决近似频繁模式挖掘问题。基于近似频繁模式结构,将其拓展到带标签图中,引入标签集约束,并设计标签集约束近似频繁模式挖掘算法LCPP(Label-Constraint Proximity Pattern),该算法并行部署在MapReduce计算模型中,弥补了开源pFP算法处理大规模数据时效率不高的缺点。实验结果验证了该算法的有效性和可扩展性,表明了LCPP算法是pFP算法的极佳补充。

关键词: 近似频繁模式, 标签集约束, 并行化