Computer Engineering and Applications ›› 2020, Vol. 56 ›› Issue (16): 1-12.DOI: 10.3778/j.issn.1002-8331.2004-0171

Previous Articles     Next Articles

Survey of High Utility Pattern Generate Strategy

GAO Man, HAN Meng, LEI Bingbing   

  1. College of Computer Science and Engineering, North Minzu University, Yinchuan 750000, China
  • Online:2020-08-15 Published:2020-08-11

高效用模式产生策略综述

高曼,韩萌,雷冰冰   

  1. 北方民族大学 计算机科学与工程学院,银川 750000

Abstract:

High utility pattern mining is used to find serviceable information from the data for users. Due to the multiple existing algorithms of high utility pattern mining, how to choose the superior way and be in progress is the prevalent problem. To solve this problem, people consider the classification of high utility pattern mining algorithms firstly, and find out the corresponding algorithm according to these issues afterward. Multiple different kinds of algorithms can be divided, in the light of different perspectives, i.e. divided into tree-based and utility list-based methods from the types using data structures, divided into one-stage and two-stage methods from the implementation needs. In addition, the algorithms can be divided through the pruning strategy, such as projection, remain utility, minimum utility improvement, etc. This paper starts with analyzing one-stage and two-stage high utility methods, which aims at tree-based two-stage algorithm and list-based one-stage algorithm. And then, this paper explores the tree-based one from whether to generate candidates. Finally, this paper discusses the space reduction strategy of high utility mining, e.g., pruning strategy and projection technology. Through analysis, the one-stage algorithm is superior to the two-stage algorithm in terms of time and space, and the algorithm that does not produce candidate itemsets is superior to the algorithm that produces candidate itemsets in terms of time and space. Generally, the algorithm reduces the search space through a variety of pruning strategies.

Key words: high utility, pattern mining, tree, list, one-stage algorithm, two-stage algorithm, pruning strategy

摘要:

高效用模式挖掘用于从数据中找出对用户有用的信息。现有的高效用模式挖掘算法很多,如何选择更优的方法进行使用,是普遍存在的问题。要解决这个问题首先要了解高效用模式挖掘算法的分类,继而针对问题找出对应的算法。按照不同的角度可以划分多种不同类型的算法。从使用数据结构的类型,划分为基于树和基于效用列表的方法;从算法所需要经历的阶段,划分为一阶段和两阶段算法;还可以从算法使用的剪枝策略进行划分,如投影,保留效用,提高最小阈值等。首先对一阶段、两阶段高效用模式算法进行分析,主要分析基于树的两阶段算法和基于列表的一阶段算法。然后从是否产生候选分析基于树的高效用模式算法。最后分析高效用模式算法用到的缩减空间策略,如剪枝策略、投影技术等。通过分析得到一阶段算法在时间与空间上优于两阶段算法,不产生候选项集的算法在时间与空间上优于产生候选项集的算法,算法缩小搜索空间一般通过多种剪枝策略。

关键词: 高效用, 模式挖掘, 树, 列表, 一阶段算法, 二阶段算法, 剪枝策略