Personalized Federal Learning Framework for Multi-Source Data

doi:10.3778/j.issn.1002-8331.2306-0366

Abstract

Abstract: In federated learning, the central server aggregates and models from different clients after differential privacy perturbation, in which the size of differential privacy noise addition and the allocation of the privacy budget directly affect the usability of the model, most of the existing studies are based on balanced data and fixed privacy budgets, which makes it difficult to trade-off the accuracy and the level of privacy protection when dealing with imbalanced data from multiple sources. To address this problem, a federated learning framework with adaptive differential privacy noise addition is proposed, which adopts a contribution proof algorithm based on the Shapley value to compute the contribution degree of clients with different data sources, and based on the contribution degree, differentiated differential privacy noise is added for different clients in the process of gradient updating, and then personalized privacy protection is achieved. Theoretical and experimental analyses show that this framework can not only provide a more fine-grained level of privacy protection for different participants when facing multi-source unbalanced data, but also outperforms the traditional FL-DP algorithm by 1.3 percentage points in terms of model performance.

Key words: federated learning, differential privacy, Shapley value, unbalanced data

摘要： 在联邦学习中，中心服务器聚合来自不同的客户端经过差分隐私扰动后的模型，其中差分隐私噪声添加的大小和隐私预算的分配直接影响到模型的可用性，现有的研究大多基于平衡的数据和固定的隐私预算，在处理多源不平衡数据时难以权衡精度与隐私保护水平，针对该问题提出了一种具有自适应差分隐私噪声添加的联邦学习框架，采取基于沙普利值的贡献度证明算法计算不同数据来源的客户端的贡献度，并依据贡献度为不同客户端在梯度更新的过程中添加差异化的差分隐私噪声，继而实现个性化的隐私保护。理论和实验分析表明该框架面对多源不平衡数据时不仅可以为不同参与方提供更加细化的隐私保护水平，同时在模型性能方面也比传统的FL-DP算法高出1.3个百分点。

关键词: 联邦学习, 差分隐私, 沙普利值, 不平衡数据

PEI Langtao, CHEN Xuebin, REN Zhiqiang, ZHAI Ran. Personalized Federal Learning Framework for Multi-Source Data[J]. Computer Engineering and Applications, 2024, 60(19): 278-287.

裴浪涛, 陈学斌, 任志强, 翟冉. 面向多源数据的个性化联邦学习框架[J]. 计算机工程与应用, 2024, 60(19): 278-287.

References

[1] KONEČNÝ J, MCMAHAN H B, YU F X, et al. Federated learning: strategies for improving communication efficiency[J]. arXiv:1610.05492, 2016.
[2] DUAN M, LIU D, CHEN X, et al. Self-balancing federated learning with global imbalanced data in mobile systems[J]. IEEE Transactions on Parallel and Distributed Systems, 2020, 32(1): 59-71.
[3] WANG W, ZHANG M. Tensor deep learning model for heterogeneous data fusion in Internet of things[J]. IEEE Transactions on Emerging Topics in Computational Intelligence, 2018, 4(1): 32-41.
[4] ROTHCHILD D, PANDA A, ULLAH E, et al. FetchSGD: communication-efficient federated learning with sketching[C]//Proceedings of the International Conference on Machine Learning, 2020: 8253-8265.
[5] BEIMEL A, KOROLOVA A, NISSIM K, et al. The power of synergy in differential privacy: combining a small curator with local randomizers[J]. arXiv:1912.08951, 2019.
[6] ZHANG W, WANG X, ZHOU P, et al. Client selection for federated learning with non-IID data in mobile edge computing[J]. IEEE Access, 2021, 9: 24462-24474.
[7] LI T, SAHU A K, ZAHEER M, et al. Federated optimization in heterogeneous networks[C]//Proceedings of the Conference on Machine Learning and Systems, 2020: 429-450.
[8] TRUEX S, LIU L, CHOW K H, et al. LDP-Fed: federated learning with local differential privacy[C]//Proceedings of the Third ACM International Workshop on Edge Systems, Analytics and Networking, 2020: 61-66.
[9] GURSOY M E, TAMERSOY A, TRUEX S, et al. Secure and utility-aware data collection with condensed local differential privacy[J]. IEEE Transactions on Dependable and Secure Computing, 2019, 18(5): 2365-2378.
[10] AONO Y, HAYASHI T, TRIEU PHONG L, et al. Scalable and secure logistic regression via homomorphic encryption[C]//Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy, 2016: 142-144.
[11] WANG N, XIAO X, YANG Y, et al. Collecting and analyzing multidimensional data with local differential privacy[C]//Proceedings of the 2019 IEEE 35th International Conference on Data Engineering (ICDE), 2019: 638-649.
[12] ALAYA B, LAOUAMER L, MSILINI N. Homomorphic encryption systems statement: trends and challenges[J]. Computer Science Review, 2020, 36: 100235.
[13] DWORK C. Differential privacy[C]//Proceedings of the International Colloquium on Automata, Languages, and Programming. Berlin, Heidelberg: Springer, 2006: 1-12.
[14] BICHSEL B, GEHR T, DRACHSLER-COHEN D, et al. DP-Finder: finding differential privacy violations by sampling and optimization[C]//Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, 2018: 508-524.
[15] NIU B, CHEN Y, WANG B, et al. AdaPDP: adaptive personalized differential privacy[C]//Proceedings of the IEEE Conference on Computer Communications, 2021: 1-10.
[16] DU J, LI S, CHEN X, et al. Dynamic differential-privacy preserving SGD[J]. arXiv:2111.00173, 2021.
[17] MCMAHAN B, MOORE E, RAMAGE D, et al. Communication-efficient learning of deep networks from decentralized data[C]//Proceedings of the International Conference on Artificial Intelligence and Statistics, 2017: 1273-1282.
[18] YANG Q, LIU Y, CHENG Y, et al. Federated learning[J]. Synthesis Lectures on Artificial Intelligence and Machine Learning, 2019, 13(3): 1-207.
[19] RODRÍGUEZ-BARROSO N, STIPCICH G, JIMÉNEZ-LÓPEZ D, et al. Federated learning and differential privacy: software tools analysis, the Sherpa.ai FL framework and methodological guidelines for preserving data privacy[J]. Information Fusion, 2020, 64: 270-292.