计算机工程与应用 ›› 2017, Vol. 53 ›› Issue (6): 101-105.DOI: 10.3778/j.issn.1002-8331.1508-0158

• 大数据与云计算 • 上一篇    下一篇

基于节点集Top-k频繁模式挖掘算法

孙  俊,张曦煌   

  1. 江南大学 物联网工程学院,江苏 无锡 214122
  • 出版日期:2017-03-15 发布日期:2017-05-11

Top-k frequent patterns based on nodesets

SUN Jun, ZHANG Xihuang   

  1. School of Internet of Things Engineering, Jiangnan University, Wuxi, Jiangsu 214122, China
  • Online:2017-03-15 Published:2017-05-11

摘要: 频繁模式挖掘的模式数量通常过于巨大,在实际应用中只有少量的频繁模式被使用。Top-k频繁模式挖掘通过排列模式频数限制频繁模式的数量,有效提高了算法效率。提出了TPN(Top-k-Patterns based on Nodesets)算法,该算法使用了节点集的概念,将数据压缩于Poc-tree,通过Top-k-rank表重新计算最小支持度限制生成候选模式的数量。实验通过与ATFP,Top-k-FP-growth算法比较,证明该算法有较好的效率。

关键词: 数据挖掘, top-k, 频繁模式, 节点集

Abstract: The number of mined patterns is usually too large and a small number of frequent patterns are used in real application. Therefore, the mining of top-rank-k frequent patterns which limits the number of mined frequent patterns by ranking them in frequency, has improved the efficiency of the algorithm. This paper proposes the TPN algorithm for mining top-k frequent patterns. The TPN employs a new data structure, Nodesets, to represent patterns, compressing the data to Poc-tree and computing min support patterns to limit candidate items by the top-k-rank table. The experiments are conducted to evaluate TPN and ATFP, Top-k-FP-growth in terms of mining time for two datasets. The experimental results show that TPN is more efficient and faster.

Key words: data mining, top-k, frequent patterns, nodesets