Constrained Cross-Level High Utility Itemsets Mining over Data Stream

doi:10.3778/j.issn.1002-8331.2305-0345

Abstract

Abstract: The traditional high utility itemsets mining algorithms cannot discover the relationships between categories at different levels of abstraction. Therefore, cross-level high utility itemsets mining algorithms have been proposed. To address the problem that the current cross-level high utility itemsets mining algorithms can only handle static data and cannot control the range of mining levels, a dynamic classification list structure DTUL is proposed to store and maintain the utility and classification information of the itemset in the window. Based on this structure, this paper proposes the first constrained cross-level high utility itemset mining algorithm based on sliding window, including the bottom-up mining algorithm CCLHM_DTU algorithm and the top-down mining algorithm CCLHM_UTD algorithm. Extensive experiments are conducted on databases containing categorical information. The experimental results show that the proposed algorithm can effectively deal with the data stream and flexibly constrain the hierarchical range of itemsets.

Key words: high utility itemsets mining, cross-level high utility itemset, data stream, sliding window, utility list

摘要： 传统的高效用项集挖掘算法无法发现不同抽象层级类别之间的关系。因此，有研究者提出了跨层级的高效用项集挖掘算法。针对当前跨层级的高效用项集挖掘算法仅能处理静态数据并且无法控制挖掘层级范围的问题，提出了一种动态类别列表结构DTUL存储并维护窗口内的项集效用和类别信息。基于此结构，首次提出了基于滑动窗口的约束跨层级高效用项集挖掘算法，包括自下而上挖掘的CCLHM_DTU算法和自上而下挖掘的CCLHM_UTD算法。在含有类别信息的数据集上进行了大量实验，实验结果表明提出的算法能够有效处理数据流并灵活约束项集的层级范围。

关键词: 高效用项集挖掘, 跨层级高效用项集, 数据流, 滑动窗口, 效用列表

LIU Shujuan, HAN Meng, GAO Zhihui, MU Dongliang, LI Ang. Constrained Cross-Level High Utility Itemsets Mining over Data Stream[J]. Computer Engineering and Applications, 2024, 60(13): 287-300.

刘淑娟, 韩萌, 高智慧, 穆栋梁, 李昂. 数据流上的约束跨层级高效用项集挖掘[J]. 计算机工程与应用, 2024, 60(13): 287-300.

References

[1] KUMAR S, MOHBEY K K. A review on big data based parallel and distributed approaches of pattern mining[J]. Journal of King Saud University-Computer and Information Sciences, 2022, 34(5): 1639-1662.
[2] HAN M, ZHANG N, WANG L, et al. Mining closed high utility patterns with negative utility in dynamic databases[J]. Applied Intelligence, 2023, 53(10): 11750-11767.
[3] LIN J C W, DJENOURI Y, SRIVASTAVA G, et al. A predictive GA-based model for closed high-utility itemset mining[J]. Applied Soft Computing, 2021, 108: 107422.
[4] FOURNIER-VIGER P, WANG Y, LIN J C W, et al. Mining cross-level high utility itemsets[C]//Proceedings of the International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, Kitakyushu, Japan, Sep 22-25, 2020. Cham, Switzerland: Springer, 2020: 858-871.
[5] 张莹莹. 一种多维多层的关联规则挖掘算法在教育数据上的应用[D]. 长春: 吉林大学, 2017.
ZHANG Y Y. Mining multidimensional and multilevel association rules in educational data[D]. Changchun: Jilin University, 2017.
[6] ZHONG M, JIANG T, HONG Y, et al. Performance of multi-level association rule mining for the relationship between causal factor patterns and flash flood magnitudes in a humid area[J]. Geomatics, Natural Hazards and Risk, 2019, 10(1): 1967-1987.
[7] SRIKANT R, AGRAWAL R. Mining generalized association rules[J]. Future Generation Computer Systems, 1997, 13(2/3): 161-180.
[8] HIPP J, MYKA A, WIRTH R, et al. A new algorithm for faster mining of generalized association rules[C]//Proceedings of the Principles of Data Mining and Knowledge Discovery: Second European Symposium, Nantes, France, Sep 23-26, 1998. Berlin, Heidelberg: Springer, 2006: 74-82.
[9] SRIPHAEW K, THEERAMUNKONG T. A new method for finding generalized frequent itemsets in generalized association rule mining[C]//Proceedings of the ISCC 2002 Seventh International Symposium on Computers and Communications, Taormina-Giardini Naxos, Italy, Jul 1-4, 2002. Los Alamitos, CA: IEEE Computer Society, 2002: 1040-1045.
[10] BARALIS E, CAGLIERO L, CERQUITELLI T, et al. Generalized association rule mining with constraints[J]. Information Sciences, 2012, 194: 68-84.
[11] CAGLIERO L, CHIUSANO S, GARZA P, et al. Discovering high-utility itemsets at multiple abstraction levels[C]//Proceedings of the European Conference on Advances in Databases and Information Systems, Nicosia, Cyprus, Sep 24-27, 2017. Cham, Switzerland: Springer, 2017: 224-234.
[12] TUNG N, NGUYEN L T, NGUYEN T D, et al. Efficient mining of cross-level high-utility itemsets in taxonomy quantitative databases[J]. Information Sciences, 2022, 587: 41-62.
[13] NOUIOUA M, WANG Y, FOURNIER-VIGER P, et al. Tkc: mining top-k cross-level high utility itemsets[C]//Proceedings of the 2020 International Conference on Data Mining Workshops, Sorrento, Italy, Nov, 17-20, 2020. New York: IEEE, 2021: 673-682.
[14] CHU C J, TSENG V S, LIANG T. An efficient algorithm for mining temporal high utility itemsets from data streams[J]. Journal of Systems and Software, 2008, 81(7): 1105-1117.
[15] RYANG H, YUN U. High utility pattern mining over data streams with sliding window technique[J]. Expert Systems with Applications, 2016, 57: 214-231.
[16] JAYSAWAL B P, HUANG J W. Sohupds: a single-pass one-phase algorithm for mining high utility patterns over a data stream[C]//Proceedings of the 35th Annual ACM Symposium on Applied Computing, Brno, Czech Republic, Mar 30-Apr 3, 2020. New York: ACM, 2020: 490-497.
[17] LEE C, RYU T, KIM H, et al. Efficient approach of sliding window-based high average-utility pattern mining with list structures[J]. Knowledge-Based Systems, 2022, 256: 109702.
[18] DAWAR S, SHARMA V, GOYAL V. Mining top-k high-utility itemsets from a data stream under sliding window model[J]. Applied Intelligence, 2017, 47(4): 1240-1255.
[19] 程浩东, 韩萌, 张妮, 等. 基于滑动窗口模型的数据流闭合高效用项集挖掘[J]. 计算机研究与发展, 2021, 58(11): 2500-2514.
CHENG H D, HAN M, ZHANG N, et al. Closed high utility itemsets mining over data stream based on sliding window model[J]. Journal of Computer Research and Development, 2021, 58(11): 2500-2514.