Computer Engineering and Applications ›› 2018, Vol. 54 ›› Issue (24): 57-60.DOI: 10.3778/j.issn.1002-8331.1808-0400
Previous Articles Next Articles
CHEN Jian, ZHANG Xiaohong
Online:
Published:
陈 建,张小红
Abstract: Aiming at the problem of large redundancy of data in big data, an approximately duplicated data detecting method based on Shannon entropy and fuzzy integrated evaluation is proposed. Firstly, attributes in data set are reduced based on Shannon entropy, and then fuzzy integrated evaluation method is adopted to get the weights of the attributes after their reduction, lastly the approximately data is detected according to the reduced attributes and their weights. Theoretical analysis and experimental comparison show that this method has high detection accuracy and efficiency in approximately data detecting of structured big data set.
Key words: information entropy, fuzzy integrated evaluation, approximately data, attribute reduction, rough set
摘要: 针对大数据环境下数据冗余量大的问题,以粗糙集理论为基础,提出了一种基于香农信息熵(Shannon entropy)融合模糊综合评判的相似重复数据检测方法,首先基于香农熵对数据集中的属性进行约简,然后采用模糊综合评判方法获取约简后各属性的重要性权值,最后依据约简属性及其权值进行相似数据的检测。理论分析与实验对比表明,该方法在结构化大数据集的相似数据检测中,有较高的检测精度与效率。
关键词: 信息熵, 模糊综合评判, 相似数据, 属性约简, 粗糙集
CHEN Jian, ZHANG Xiaohong. Approximately data detecting method based on fusion of information entropy and fuzzy integrated evaluation[J]. Computer Engineering and Applications, 2018, 54(24): 57-60.
陈 建,张小红. 信息熵与模糊综合评判融合的相似数据检测方法[J]. 计算机工程与应用, 2018, 54(24): 57-60.
0 / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: http://cea.ceaj.org/EN/10.3778/j.issn.1002-8331.1808-0400
http://cea.ceaj.org/EN/Y2018/V54/I24/57