计算机工程与应用 ›› 2020, Vol. 56 ›› Issue (19): 86-93.DOI: 10.3778/j.issn.1002-8331.1911-0052

• 大数据与云计算 • 上一篇    下一篇

基于位编码链表的快速频繁模式挖掘算法研究

顾军华,苏鸣,张亚娟,张丹红   

  1. 1.河北工业大学 人工智能与数据科学学院,天津 300401
    2.河北省大数据计算重点实验室,天津 300401
  • 出版日期:2020-10-01 发布日期:2020-09-29

Research on Fast Frequent Pattern Mining Algorithm Based on Bitmap-Code List

GU Junhua, SU Ming, ZHANG Yajuan, ZHANG Danhong   

  1. 1.School of Artificial Intelligence and Data Science, Hebei University of Technology, Tianjin 300401, China
    2.Hebei Province Key Laboratory of Big Data Computing, Tianjin 300401, China
  • Online:2020-10-01 Published:2020-09-29

摘要:

多数基于FP-growth思想的频繁模式挖掘算法存在建树过程复杂、支持度计算繁琐的问题。针对这些问题,提出一种基于位编码链表(Bitmap-Code List,BC-List)的频繁项集挖掘算法(BC-List Frequent Itemsets Mining,BCLFIM)。该算法首先采用基于位图表示的节点编码模型生成位图树(BC-tree),以BC-tree的节点信息作为数据结构通过按位运算来快速获取BC-List的节点集,避免了复杂的交集运算,提高了连接效率;其次通过使用超集等价和支持度计数剪枝策略,缩小了挖掘频繁模式的搜索空间。实验结果证明,该算法相比于FIN算法和DFIN算法具有更快的挖掘速度。

关键词: 频繁项集挖掘, 关联规则, 剪枝策略, 位图编码

Abstract:

Most of the frequent pattern mining algorithms based on the FP-growth idea have the disadvantages of complex construction rules and cumbersome support calculations. This paper proposes a Frequent Item set Mining algorithm(BCLFIM) based on Bitmap-Code List(BC-List) to improve this problem. Firstly, in this algorithm, a node coding model based on bitmap  representation is adopted to generate BC-tree, and the node information of BC-tree is used as the data structure to quickly obtain the node set of BC-List by bitwise operation, which can reduce complicated intersection operation and improve connection efficiency. Secondly, the search space for mining frequent patterns is reduced by using the superset equivalence and support count prune strategy. Experimental show that the algorithm has faster mining speed than FIN and DFIN algorithms.

Key words: frequent item mining, association rules, pruning strategy, bitmap encoding