基于空间动态划分的差分隐私聚类算法

doi:10.3778/j.issn.1002-8331.1912-0215

计算机工程与应用 ›› 2021, Vol. 57 ›› Issue (2): 97-103.DOI: 10.3778/j.issn.1002-8331.1912-0215

基于空间动态划分的差分隐私聚类算法

张可铧，成卫青

1.南京邮电大学计算机学院，南京 210023
2.东南大学计算机网络和信息集成教育部重点实验室，南京 211189

出版日期:2021-01-15 发布日期:2021-01-14

Differential Privacy Clustering Algorithm Based on Spatial Dynamic Partition

ZHANG Kehua, CHENG Weiqing

1.School of Computer, Nanjing University of Posts & Telecommunications, Nanjing 210023, China
2.Key Laboratory of Computer Network & Information Integration of Ministry of Education, Southeast University, Nanjing 211189, China

Online:2021-01-15 Published:2021-01-14

摘要/Abstract

摘要：

差分隐私算法作为当前研究较多的隐私保护机制之一，有着广泛应用。目前有多种基于差分隐私保护的[k]均值聚类算法，应用场景不一，各有缺陷。以往的算法通过均等划分数据集，构造等宽直方图进行聚类，这会导致没有数据分布的区域也被无差别插入噪声，影响聚类性能。针对这一点，提出了一种新的差分隐私聚类算法[DPQTk]-means，先通过构建差分隐私四分树，用大小不一的自适应存储桶动态划分数据空间，充分表示数据集同时减少噪声插入，再进行[k]均值聚类，证明了其满足[ε]-差分隐私保护。实验结果表明，[DPQTk]-means算法与以往的差分隐私聚类算法相比具有更好的聚类可用性，且能够在隐私保护水平较高的同时保持稳定的聚类性能。

关键词: 差分隐私, 四分树, 动态划分, [k]均值

Abstract:

As one of the most popular privacy protection mechanisms, the differential privacy algorithm has been widely used. At present, there are a variety of [k]-means clustering algorithms based on differential privacy protection. The application scenarios are different and each has its own defects. There is an algorithm that divides the data set and constructs the equal-width histograms for clustering. This causes the areas without data to be inserted into noise without any difference, which affects clustering performance. To solve this problem, a new differential privacy clustering algorithm [DPQTk]-means is proposed. By constructing a differential privacy quad tree, the data space is dynamically divided by adaptive buckets of different sizes to fully represent the data set and reduce noise insertion, and then do [k]-means clustering. It proves that it satisfies [ε]-differential privacy protection. Experimental results show that the [DPQTk]-means algorithm has better cluster availability than the previous differential privacy clustering algorithms, and can maintain stable clustering performance while maintaining a high level of privacy protection.

Key words: differential privacy, quad tree, dynamic partition, [k]-means

张可铧，成卫青. 基于空间动态划分的差分隐私聚类算法[J]. 计算机工程与应用, 2021, 57(2): 97-103.

ZHANG Kehua, CHENG Weiqing. Differential Privacy Clustering Algorithm Based on Spatial Dynamic Partition[J]. Computer Engineering and Applications, 2021, 57(2): 97-103.

[1]	侯尧，陶洋，杨理，熊炼. 基于差分隐私的个人轨迹信息保护机制[J]. 计算机工程与应用, 2020, 56(9): 106-110.
[2]	许斌，梁晓兵，沈博. 大数据环境中非交互式查询差分隐私保护模型[J]. 计算机工程与应用, 2020, 56(7): 116-121.
[3]	梁晓兵，许斌，翟峰，沈博. 基于属性分类的用电大数据隐私保护方法[J]. 计算机工程与应用, 2020, 56(5): 93-100.
[4]	高琦，李红娇. 面向用电数据的周期敏感度差分隐私保护方法[J]. 计算机工程与应用, 2020, 56(20): 73-81.
[5]	王佳贺，魏松杰，吴超. 差分隐私保护的Android应用流量行为混淆方法[J]. 计算机工程与应用, 2020, 56(2): 68-75.
[6]	张思佳1，顾春华2，温蜜1. 智能电网中的数据聚合方案分类研究[J]. 计算机工程与应用, 2019, 55(12): 83-89.
[7]	薛佳楣，张磊，玄子玉. Voronoi图划分实现位置数据发布隐私保护[J]. 计算机工程与应用, 2019, 55(10): 121-126.
[8]	姜道银1，2，葛洪伟1，2，袁罗1. 一种动态划分的混合连续域蚁群优化算法[J]. 计算机工程与应用, 2018, 54(7): 144-151.
[9]	薛印玺，许鸿文，李羚. 基于样本密度的全局优化K均值聚类算法[J]. 计算机工程与应用, 2018, 54(14): 143-147.
[10]	万静，孙永倩，董怀国，肖宇鹏，齐坡. 空间聚类与方向关系的融合技术研究[J]. 计算机工程与应用, 2016, 52(9): 56-61.
[11]	黄芬1，于琪1，姚霞2，商贵艳2，朱艳2，伍艳莲1，黄宇2. 小麦冠层图像H分量的K均值聚类分割[J]. 计算机工程与应用, 2014, 50(3): 129-134.
[12]	许竣玮，徐蔚鸿. 基于扰动免疫粒子群和K均值的混合聚类算法[J]. 计算机工程与应用, 2014, 50(22): 163-169.
[13]	赵跃华，林聚伟. 面向海量病毒样本家族聚类方法的研究[J]. 计算机工程与应用, 2014, 50(18): 118-121.
[14]	沈国珍. 依赖数据密度的K均值初始化调优[J]. 计算机工程与应用, 2014, 50(11): 139-144.
[15]	赵杰1，桑庆兵1，刘毅锟2. 基于分裂式K均值聚类的肤色检测方法[J]. 计算机工程与应用, 2014, 50(1): 134-138.

基于空间动态划分的差分隐私聚类算法

Differential Privacy Clustering Algorithm Based on Spatial Dynamic Partition

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics