计算机工程与应用 ›› 2015, Vol. 51 ›› Issue (22): 154-157.

• 数据库、数据挖掘、机器学习 • 上一篇    下一篇

矩阵与前缀树方法挖掘频繁项集

丁邦旭,黄永青   

  1. 铜陵学院 数学与计算机学院,安徽 铜陵 244000
  • 出版日期:2015-11-15 发布日期:2015-11-16

Algorithm of matrix and Prefix-tree for mining frequent itemsets

DING Bangxu, HUANG Yongqing   

  1. School of Mathematics and Computer, Tongling University, Tongling, Anhui 244000, China
  • Online:2015-11-15 Published:2015-11-16

摘要: 传统频繁项集挖掘算法的执行效率较低。提出了一种基于矩阵与前缀树的频繁项集挖掘算法MPFI,能快速地挖掘事务数据库中的频繁项集。MPFI算法只需扫描事务数据库一次,构建垂直方向的二进制矩阵,应用二进制位向量表达频繁项集信息,利用前缀树压缩存储频繁项集的相关信息,不产生候选项集。理论分析与实验结果表明,MPFI算法能有效地提高频繁项集挖掘效率。

关键词: 频繁项集, 矩阵, 二进制, 前缀树

Abstract: Traditional algorithm of frequent itemsets mining’s execution efficiency is low. MPFI algorithm based on matrix and Prefix-tree is raised for frequent itemsets mining. It can quickly mining frequent itemsets of transaction database. MPFI algorithm only scan transaction database once, builds vertical binary matrix. Binary vector for frequent itemsets information and Prefix-tree data structure for compression storage of frequent itemsets is applied in the algorithm, without candidate itemsets. According to theoretical analysis and experimental results, MPFI algorithm can effectively improve the efficiency of frequent itemsets mining.

Key words: frequent itemsets, matrix, binary, Prefix-tree