计算机工程与应用 ›› 2020, Vol. 56 ›› Issue (5): 93-100.DOI: 10.3778/j.issn.1002-8331.1904-0187

• 大数据与云计算 • 上一篇    下一篇

基于属性分类的用电大数据隐私保护方法

梁晓兵,许斌,翟峰,沈博   

  1. 1.中国电力科学研究院有限公司,北京 100192
    2.中国科学院 信息工程研究所 信息安全国家重点实验室,北京 100093
    3.中国科学院大学 网络空间安全学院,北京 100049
  • 出版日期:2020-03-01 发布日期:2020-03-06

Power Consumption Big Data Privacy Protection Method Based on Attribute Classification

LIANG Xiaobing, XU Bin, ZHAI Feng, SHEN Bo   

  1. 1. China Electric Power Research Institute, Beijing 100192, China
    2. State Key Laboratory of Information Security, Institute of Information Engineering, CAS, Beijing 100093, China
    3. School of Cyber Security, University of Chinese Academy of Sciences, Beijing 100049, China
  • Online:2020-03-01 Published:2020-03-06

摘要:

针对用电大数据环境下,非交互式差分隐私模型无法提供准确查询结果及计算开销较大的问题,提出一种基于最大信息系数与数据匿名化的差分隐私数据发布方法。从原始数据集中选出部分隐私属性作为特征集,利用最大信息系数选出与此特征集相关性高的数据作为隐私数据集,使用协同隐私保护算法对隐私数据集进行保护,发布满足差分隐私保护的用电大数据集。理论分析与实验结果表明,所提出的方法在提高大数据隐私保护处理效率同时,有效分化查询函数敏感性,提高发布数据可用性。

关键词: 差分隐私, 最大信息系数, 数据匿名化, 数据发布

Abstract:

In the environment of power consumption big data, non-interactive differential privacy can not provide accurate query results and high computational overhead, a differential privacy data publishing model based on maximum information coefficient and data anonymization is proposed. Firstly, a small number of privacy attributes with high correlation are selected from original data set as feature set by using the maximum information coefficient. Then, the proposed cooperative privacy protection algorithm is applied to the privacy data set to achieve anonymity. Finally, the collaborative privacy protection algorithm is used to protect the privacy data set, and the power consumption big data set meeting the differential privacy protection is released. The theoretical analysis and experimental results show that the proposed method not only improves the efficiency of large data privacy protection processing, but also effectively differentiates the sensitivity of query functions and improves the utility of published data.

Key words: differential privacy, maximum information coefficient, data anonymization, data publishing