Computer Engineering and Applications ›› 2015, Vol. 51 ›› Issue (24): 145-149.

Previous Articles     Next Articles

Algorithm based on ordered tree for mining maximal frequent items from uncertain data

LIU Weiming1,2, KUAI Hailong1, CHEN Zhigang3, MAO Yimin1,4   

  1. 1.School of Information Engineering, Jiangxi University of Science and Technology, Ganzhou, Jiangxi 341000, China
    2.Faculty of Resources and Environmental Engineering, Jiangxi University of Science and Technology, Ganzhou, Jiangxi 341000, China
    3.School of Software, Central South University, Changsha 410083, China
    4.Institute of Applied Science, Jiangxi University of Science and Technology, Ganzhou, Jiangxi 341000, China
  • Online:2015-12-15 Published:2015-12-30

基于有序树的不确定数据最大频繁项挖掘算法

刘卫明1,2,蒯海龙1,陈志刚3,毛伊敏1,4   

  1. 1.江西理工大学 信息工程学院,江西 赣州 341000
    2.江西理工大学 资源与环境工程学院,江西 赣州 341000
    3.中南大学 软件学院,长沙 410083
    4.江西理工大学 应用科学学院,江西 赣州 341000

Abstract: In order to resolve the problem of data and route redundant in UF-tree, this paper designs the Sequential Compressed Uncertain Frequent Pattern Tree(SCUF-tree), and stores different support values of element in the node, so as to compress the storage space and conveniently transplant the algorithm which can mine maximal frequent itemsets in certain data. Then it combines the design idea of the MMFI algorithm and proposes UMMFI algorithm which can mine maximal frequent itemsets in the uncertain database. The UMMFI algorithm adopts a NBN(Node By Node) strategy to mine maximal frequent itemsets. Experiment results manifest that the UMMFI algorithm is effective and adaptable in the uncertain database.

Key words: maximal frequent itemsets in uncertain databases, Mining Maximal Frequent Items from Uncertain data(UMMFI) algorithm, Sequential Compressed Uncertain Frequent pattern tree(SCUF-tree), Node By Node(NBN) strategy

摘要: 针对UF-tree中项集存在的数据和路径冗余的问题,设计了有序的压缩不确定树SCUF-tree,在节点中存储元素的不同支持度,达到压缩存储空间和方便移植已有的确定数据最大频繁项集算法的目的。结合最大频繁项集挖掘算法MMFI的设计思想,提出了一种挖掘不确定最大频繁项集算法UMMFI算法,并采取逐层逐个的NBN策略挖掘不确定最大频繁项集。实验结果表明,UMMFI算法具有较好的时空效益和适应性。

关键词: 不确定数据的最大频繁项集, 不确定数据最大频繁项挖掘(UMMFI)算法, 有序的压缩不确定树(SCUF-tree), 逐层逐个地处理节点(NBN)策略