计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (13): 287-300.DOI: 10.3778/j.issn.1002-8331.2305-0345

• 大数据与云计算 • 上一篇    下一篇

数据流上的约束跨层级高效用项集挖掘

刘淑娟,韩萌,高智慧,穆栋梁,李昂   

  1. 北方民族大学 计算机科学与工程学院,银川 750021
  • 出版日期:2024-07-01 发布日期:2024-07-01

Constrained Cross-Level High Utility Itemsets Mining over Data Stream

LIU Shujuan, HAN Meng, GAO Zhihui, MU Dongliang, LI Ang   

  1. School of Computer Science & Engineering, North Minzu University, Yinchuan 750021, China
  • Online:2024-07-01 Published:2024-07-01

摘要: 传统的高效用项集挖掘算法无法发现不同抽象层级类别之间的关系。因此,有研究者提出了跨层级的高效用项集挖掘算法。针对当前跨层级的高效用项集挖掘算法仅能处理静态数据并且无法控制挖掘层级范围的问题,提出了一种动态类别列表结构DTUL存储并维护窗口内的项集效用和类别信息。基于此结构,首次提出了基于滑动窗口的约束跨层级高效用项集挖掘算法,包括自下而上挖掘的CCLHM_DTU算法和自上而下挖掘的CCLHM_UTD算法。在含有类别信息的数据集上进行了大量实验,实验结果表明提出的算法能够有效处理数据流并灵活约束项集的层级范围。

关键词: 高效用项集挖掘, 跨层级高效用项集, 数据流, 滑动窗口, 效用列表

Abstract: The traditional high utility itemsets mining algorithms cannot discover the relationships between categories at different levels of abstraction. Therefore, cross-level high utility itemsets mining algorithms have been proposed. To address the problem that the current cross-level high utility itemsets mining algorithms can only handle static data and cannot control the range of mining levels, a dynamic classification list structure DTUL is proposed to store and maintain the utility and classification information of the itemset in the window. Based on this structure, this paper proposes the first constrained cross-level high utility itemset mining algorithm based on sliding window, including the bottom-up mining algorithm CCLHM_DTU algorithm and the top-down mining algorithm CCLHM_UTD algorithm. Extensive experiments are conducted on databases containing categorical information. The experimental results show that the proposed algorithm can effectively deal with the data stream and flexibly constrain the hierarchical range of itemsets.

Key words: high utility itemsets mining, cross-level high utility itemset, data stream, sliding window, utility list