计算机工程与应用 ›› 2020, Vol. 56 ›› Issue (7): 116-121.DOI: 10.3778/j.issn.1002-8331.1812-0081

• 网络、通信与安全 • 上一篇    下一篇

大数据环境中非交互式查询差分隐私保护模型

许斌,梁晓兵,沈博   

  1. 1.中国电力科学研究院有限公司,北京 100192
    2.中国科学院信息工程研究所 信息安全国家重点实验室,北京 100093
    3.中国科学院大学 网络空间安全学院,北京 100049
  • 出版日期:2020-04-01 发布日期:2020-03-28

Non-interactive Queries Differential Privacy Protection Model in Big Data Environment

XU Bin, LIANG Xiaobing, SHEN Bo   

  1. 1.China Electric Power Research Institute, Beijing 100192, China
    2.State Key Laboratory of Information Security, Institute of Information Engineering, Beijing 100093, China
    3.School of Cyber Security, University of Chinese Academy of Sciences, Beijing 100049, China
  • Online:2020-04-01 Published:2020-03-28

摘要:

针对大数据环境下,非交互式差分隐私无法准确提供及处理大量范围查询的问题,提出一种基于最大信息系数与机器学习的隐私保护数据查询模型。对原始数据集采用最大信息系数选出相关性低的数据作为训练样本集,然后结合差分隐私的并行组合性质对其进行分块划分得到隐私保护的训练样本集,最后应用线性回归算法训练样本集得到差分隐私保护预测模型,该模型隐私保护的方式回答当前提交和大量未知的查询。实验结果表明,所提出的模型在提升发布数据效用性的同时,也提高了查询处理的效率。

关键词: 差分隐私, 最大信息系数, 隐私保护, 范围查询

Abstract:

In the big data environment, non-interactive differential privacy can not accurately provide and deal with a large number of queries. A privacy protection data query model based on maximum information coefficient and machine learning is proposed. Firstly, the data with low correlation is selected as the training sample set by using the maximum information coefficient of the original data set, and then combined with the parallel combination property of differential privacy to obtain the privacy-protected training sample set. Finally, the linear regression algorithm is used to train the sample. The differential privacy protection prediction model answers the current and a large number of unknown queries. The experimental results show that the proposed model improves the efficiency of query processing while improving the utility of published data.

Key words: differential privacy, maximum information coefficient, privacy protection, range query