Computer Engineering and Applications ›› 2019, Vol. 55 ›› Issue (6): 133-139.DOI: 10.3778/j.issn.1002-8331.1712-0254

Previous Articles     Next Articles

Research on Mining Association Rules Based on Multi-Granularity Attribute Reduction

YANG Zhen, GENG Xiuli   

  1. Business School, University of Shanghai for Science and Technology, Shanghai 200093, China
  • Online:2019-03-15 Published:2019-03-14

考虑多粒度属性约简的关联规则挖掘研究

杨  珍,耿秀丽   

  1. 上海理工大学 管理学院,上海 200093

Abstract: In the era of big data, it has become increasingly difficult to obtain the data. And data mining is the key technology to solve this problem at present. Apriori algorithm is a common algorithm in data mining by mining potential association rules behind the data. Considering the problems of traditional Apriori algorithm, such as frequent scan data and cumbersome acquisition of candidate items, a weighted Apriori algorithm is proposed to record the number of repetitions of the total number of records. The repetition times are taken as the weight and compression matrix of data sets. Binary Boolean matrix is used to replace the original data set, through the matrix of “AND operation” to obtain the maximum frequent item set to reduce the time complexity. Considering the redundancy of the original data and the inaccuracy of attribute reduction, an algorithm of attribute reduction based on multi-granularity rough set is proposed before the association rules are extracted. The uncertainty of the information is described by the granularity of knowledge, and the attribute value is refined to reduce the precision and reduce the space complexity. Finally, the proposed algorithm is compared with the Apriori algorithm based on frequent matrices and the original Apriori algorithm to verify its practicability and validity.

Key words: multi-granularity rough set, attribute reduction, binary, weighted Apriori algorithm

摘要: 大数据时代,人们获取所需信息的困难度提高,而数据挖掘是当下解决此问题的关键技术。Apriori算法作为数据挖掘中的常用算法,通过挖掘数据背后的潜在关联规则。考虑到传统Apriori算法执行过程中,数据扫描频繁、候选集获取繁琐等问题,提出采用加权Apriori算法,即将冗余记录存储一次,并将记录的重复次数占全部记录数的比值作为权重,压缩空间;采用二进制的布尔矩阵替代原有数据集,通过矩阵内部“与运算”,获取最大频繁集,降低时间复杂度。考虑到原始数据冗余性以及粗糙集属性约简的不精确性,在提取关联规则前,提出采用多粒度粗糙集的属性约简算法,通过知识粒度细化属性值来提高约简精度,降低空间复杂度。最后,将所提方法与基于频繁矩阵的Apriori算法以及原始Apriori算法进行比较,验证所提方法的实用性和有效性。

关键词: 多粒度粗糙集, 属性约简, 二进制, 加权Apriori算法