DP-IMKP：满足个性化差分隐私的数据发布保护方法

doi:10.3778/j.issn.1002-8331.2201-0457

摘要/Abstract

摘要： 差分隐私因能提供强大的隐私保证，广泛应用于解决数据发布中的隐私保护问题。但是经差分隐私保护后的数据注入大量噪音，降低了数据可用性，且已有方法中，针对混合属性数据集发布的隐私保护研究成果较少和存在隐私预算分配不合理的问题。因此，提出一种基于个性化隐私预算分配的差分隐私混合属性数据发布方法（DP-IMKP）。利用互信息与属性之间关联关系，提出一种敏感属性分级策略，使用户各属性重要程度得以量化，为不同级别的属性匹配对应的隐私保护程度；结合最优匹配理论，构造隐私预算与敏感属性之间的二部图，为各级敏感属性分配合理的隐私预算；结合信息熵和密度优化思想，对经典[k]-prototype算法中初始中心的选择和相异度度量方法进行改进，并对原始数据集进行聚类，利用各敏感属性分配的隐私预算，对聚类中心值进行差分隐私保护，防止隐私数据信息泄露。通过实验验证，DP-IMKP方法与同类方法相比，在提高数据可用性和降低数据泄露风险方面有明显优势。

关键词: 差分隐私, [k]-prototype聚类, 属性分级, 隐私预算分配, 互信息, 混合数据

Abstract: Differential privacy is widely used to solve the problem of privacy protection in data publishing because of its powerful privacy guarantee. However, the data protected by differential privacy are injected with a lot of noise, which reduces the data utility. In addition, in the existing methods, there are few research results on privacy protection published for mixed attribute datasets and unreasonable allocation of privacy budget. Therefore, this paper proposes a differential privacy mixed attribute data publishing method based on personalized privacy budget allocation（DP-IMKP）. Firstly, based on the correlation between mutual information and attributes, a classification strategy for sensitive attributes is proposed to quantify the importance of each attribute, and match the corresponding privacy protection degree for different levels of attributes. Secondly, combined with the optimal matching theory, a bipartite graph between privacy budget and sensitive attributes is constructed, the reasonable privacy budget is allocated for sensitive attributes at all levels. Combined with the idea of information entropy and density optimization, the selection of initial center and the measurement method of dissimilarity in classical [k]-prototype algorithm are improved, and the privacy budget allocated by each sensitive attribute is used to implement differential privacy protection for the clustering center value to prevent the disclosure of private data information. Experimental results show that compared with similar methods, DP-IMKP has obvious advantages in improving data utility and reducing data leakage risk.

Key words: differential privacy, [k]-prototype clustering, attribute classification, privacy budget allocation, mutual information, mixed data

张星, 张兴, 王晴阳. DP-IMKP：满足个性化差分隐私的数据发布保护方法[J]. 计算机工程与应用, 2023, 59(10): 288-298.

ZHANG Xing, ZHANG Xing, WANG Qingyang. DP-IMKP：Data Publishing Protection Method for Personalized Differential Privacy[J]. Computer Engineering and Applications, 2023, 59(10): 288-298.

参考文献

[1] 张强，叶阿勇，叶帼华，等.最优聚类的k-匿名数据隐私保护机制[J].计算机研究与发展，2022，59（7）：1625-1635.
ZHANG Q，YE A Y，YE G H，et al.k-anonymous data privacy protection mechanism based on optimal clustering[J].Journal of Computer Research and Development，2022，59（7）：1625-1635.
[2] 王明月，张兴，李万杰，等.面向数据发布的隐私保护技术研究综述[J].小型微型计算机系统，2020，41（12）：2657-2667.
WANG M Y，ZHANG X，LI W J，et al.Review of research on privacy protection technology for data publication[J].Journal of Chinese Computer Systems，2020，41（12）：2657-2667.
[3] LI N，LI T，VENKATASUBRAMANIAN S.t-closeness：privacy beyond k-anonymity and l-diversity[C]//IEEE 23rd International Conference on Data Engineering，Istanbul，Apr 15-20，2007.Piscataway：IEEE，2007：106-115.
[4] 付钰，俞艺涵，吴晓平.大数据环境下差分隐私保护技术及应用[J].通信学报，2019，40（10）：157-168.
FU Y，YU Y H，WU X P.Differential privacy protection technology and its application in big data environment[J].Journal on Communications，2019，40（10）：157-168.
[5] 康健，吴英杰，黄泗勇，等.异方差加噪下的差分隐私直方图发布算法[J].计算机科学与探索，2016，10（6）：786-798.
KANG J，WU Y J，HUANG S Y，et al.Algorithm for differential privacy histogram publication with non-uniform private budget[J].Journal of Frontiers of Computer Science and Technology，2016，10（6）：786-798.
[6] WANG R，FUNG B C M，ZHU Y.Heterogeneous data release for cluster analysis with differential privacy[J].Knowledge-Based Systems，2020，201/202：106047.
[7] PATEL K，THAKRAL P.The best clustering algorithms in data mining[C]//2016 International Conference on Communication and Signal Processing，2016.
[8] NI T J，QIAO M H，CHEN Z L，et al.Utility-efficient differentially private K-means clustering based on cluster merging[J].Neurocomputing，2021，424：205-214.
[9] YU Q，LUO Y，CHEN C，et al.Outlier-eliminated k-means clustering algorithm based on differential privacy preservation[J].Applied Intelligence，2016，45（4）：1179-1191.
[10] SU D，CAO J N，LI N H，et al.Differentially private K-means clustering[J].arXiv：1504.05998，2015.
[11] XIA C，HUA J Y，TONG W，et al.Distributed K-means clustering guaranteeing local differential privacy[J].Computers & Security，2020，90：101699.
[12] NGUYEN H H.Privacy-preserving mechanisms for k-modes clustering[J].Computers & Security，2018，78：60-75.
[13] SORIA-COMAS J，DOMINGO-FERRER J，SANCHEZ D，et al.Enhancing data utility in differential privacy via microaggregation-based k-anonymity[J].The VLDB Journal，2014，23（5）：771-794.
[14] 屈晶晶，蔡英，范艳芳，等.基于k-prototype聚类的差分隐私混合数据发布算法[J].计算机科学与探索，2021，15（1）：109-118.
QU J J，CAI Y，FAN Y F，et al.Differentially private mixed data release algorithm based on k-prototype clustering[J].Journal of Frontiers of Computer Science and Technology，2021，15（1）：109-118.
[15] LV Z，WANG L，GUAN Z，et al.An optimizing and differentially private clustering algorithm for mixed data in SDN based smart grid[J].IEEE Access，2019，7：45773-45782.
[16] 张星，张兴.DCKPDP：改进k-prototype聚类的差分隐私混合属性数据发布方法[J].计算机应用研究，2022，39（1）：249-253.
ZHANG X，ZHANG X.DCKPDP：differential privacy mixed attribute data publishing method for improved k-prototype clustering[J].Application Research of Computers，2022，39（1）：249-253.
[17] LI N H，LYU M，SU D，et al.Differential privacy：from theory to practice[M].[S.l.]：Morgan & Claypool Publishers，2016.
[18] MCSHERRY F D.Privacy integrated queries：an extensible platform for privacy-preserving data analysis[J].Communications of the ACM，2010，53（9）：89-97.
[19] LI W J，ZHANG X，LI X H，et al.PPDP-PCAO：an efficient high-dimensional data releasing method with differential privacy protection[J].IEEE Access，2019，7：176429-176437.
[20] RODRIGUEZ A，LAIO A.Clustering by fast search and find of density peaks[J].Science，2014，344（6191）：1492-1496.