计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (19): 278-287.DOI: 10.3778/j.issn.1002-8331.2306-0366

• 网络、通信与安全 • 上一篇    下一篇

面向多源数据的个性化联邦学习框架

裴浪涛,陈学斌,任志强,翟冉   

  1. 1. 华北理工大学  理学院,河北  唐山  063210
    2. 河北省数据科学与应用重点实验室(华北理工大学),河北  唐山  063210
    3. 唐山市数据科学重点实验室(华北理工大学),河北  唐山  063210
  • 出版日期:2024-10-01 发布日期:2024-09-30

Personalized Federal Learning Framework for Multi-Source Data

PEI Langtao, CHEN Xuebin, REN Zhiqiang, ZHAI Ran   

  1. 1. School of Science, North China University of Science and Technology, Tangshan, Hebei 063210, China
    2. Hebei Province Key Laboratory of Data Science and Application(North China University of Science and Technology), Tangshan, Hebei 063210, China
    3. Tangshan Data Science Laboratory(North China University of Science and Technology), Tangshan, Hebei 063210, China
  • Online:2024-10-01 Published:2024-09-30

摘要: 在联邦学习中,中心服务器聚合来自不同的客户端经过差分隐私扰动后的模型,其中差分隐私噪声添加的大小和隐私预算的分配直接影响到模型的可用性,现有的研究大多基于平衡的数据和固定的隐私预算,在处理多源不平衡数据时难以权衡精度与隐私保护水平,针对该问题提出了一种具有自适应差分隐私噪声添加的联邦学习框架,采取基于沙普利值的贡献度证明算法计算不同数据来源的客户端的贡献度,并依据贡献度为不同客户端在梯度更新的过程中添加差异化的差分隐私噪声,继而实现个性化的隐私保护。理论和实验分析表明该框架面对多源不平衡数据时不仅可以为不同参与方提供更加细化的隐私保护水平,同时在模型性能方面也比传统的FL-DP算法高出1.3个百分点。

关键词: 联邦学习, 差分隐私, 沙普利值, 不平衡数据

Abstract: In federated learning, the central server aggregates and models from different clients after differential privacy perturbation, in which the size of differential privacy noise addition and the allocation of the privacy budget directly affect the usability of the model, most of the existing studies are based on balanced data and fixed privacy budgets, which makes it difficult to trade-off the accuracy and the level of privacy protection when dealing with imbalanced data from multiple sources. To address this problem, a federated learning framework with adaptive differential privacy noise addition is proposed, which adopts a contribution proof algorithm based on the Shapley value to compute the contribution degree of clients with different data sources, and based on the contribution degree, differentiated differential privacy noise is added for different clients in the process of gradient updating, and then personalized privacy protection is achieved. Theoretical and experimental analyses show that this framework can not only provide a more fine-grained level of privacy protection for different participants when facing multi-source unbalanced data, but also outperforms the traditional FL-DP algorithm by 1.3 percentage points  in terms of model performance.

Key words: federated learning, differential privacy, Shapley value, unbalanced data