计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (10): 288-298.DOI: 10.3778/j.issn.1002-8331.2201-0457

• 网络、通信与安全 • 上一篇    下一篇

DP-IMKP:满足个性化差分隐私的数据发布保护方法

张星,张兴,王晴阳   

  1. 辽宁工业大学 电子与信息工程学院,辽宁 锦州 121001
  • 出版日期:2023-05-15 发布日期:2023-05-15

DP-IMKP:Data Publishing Protection Method for Personalized Differential Privacy

ZHANG Xing, ZHANG Xing, WANG Qingyang   

  1. School of Electronics and Information Engineering, Liaoning University of Technology, Jinzhou, Liaoning 121001, China
  • Online:2023-05-15 Published:2023-05-15

摘要: 差分隐私因能提供强大的隐私保证,广泛应用于解决数据发布中的隐私保护问题。但是经差分隐私保护后的数据注入大量噪音,降低了数据可用性,且已有方法中,针对混合属性数据集发布的隐私保护研究成果较少和存在隐私预算分配不合理的问题。因此,提出一种基于个性化隐私预算分配的差分隐私混合属性数据发布方法(DP-IMKP)。利用互信息与属性之间关联关系,提出一种敏感属性分级策略,使用户各属性重要程度得以量化,为不同级别的属性匹配对应的隐私保护程度;结合最优匹配理论,构造隐私预算与敏感属性之间的二部图,为各级敏感属性分配合理的隐私预算;结合信息熵和密度优化思想,对经典[k]-prototype算法中初始中心的选择和相异度度量方法进行改进,并对原始数据集进行聚类,利用各敏感属性分配的隐私预算,对聚类中心值进行差分隐私保护,防止隐私数据信息泄露。通过实验验证,DP-IMKP方法与同类方法相比,在提高数据可用性和降低数据泄露风险方面有明显优势。

关键词: 差分隐私, [k]-prototype聚类, 属性分级, 隐私预算分配, 互信息, 混合数据

Abstract: Differential privacy is widely used to solve the problem of privacy protection in data publishing because of its powerful privacy guarantee. However, the data protected by differential privacy are injected with a lot of noise, which reduces the data utility. In addition, in the existing methods, there are few research results on privacy protection published for mixed attribute datasets and unreasonable allocation of privacy budget. Therefore, this paper proposes a differential privacy mixed attribute data publishing method based on personalized privacy budget allocation(DP-IMKP). Firstly, based on the correlation between mutual information and attributes, a classification strategy for sensitive attributes is proposed to quantify the importance of each attribute, and match the corresponding privacy protection degree for different levels of attributes. Secondly, combined with the optimal matching theory, a bipartite graph between privacy budget and sensitive attributes is constructed, the reasonable privacy budget is allocated for sensitive attributes at all levels. Combined with the idea of information entropy and density optimization, the selection of initial center and the measurement method of dissimilarity in classical [k]-prototype algorithm are improved, and the privacy budget allocated by each sensitive attribute is used to implement differential privacy protection for the clustering center value to prevent the disclosure of private data information. Experimental results show that compared with similar methods, DP-IMKP has obvious advantages in improving data utility and reducing data leakage risk.

Key words: differential privacy, [k]-prototype clustering, attribute classification, privacy budget allocation, mutual information, mixed data