WKAG：一种针对不平衡医保数据的欺诈检测方法

doi:10.3778/j.issn.1002-8331.2002-0082

计算机工程与应用 ›› 2021, Vol. 57 ›› Issue (9): 247-254.DOI: 10.3778/j.issn.1002-8331.2002-0082

WKAG：一种针对不平衡医保数据的欺诈检测方法

吴文龙，周喜，王轶，王保全

1.中国科学院新疆理化技术研究所，乌鲁木齐 830011
2.中国科学院大学，北京 100049
3.新疆民族语音语言信息处理实验室，乌鲁木齐 830011

出版日期:2021-05-01 发布日期:2021-04-29

WKAG：Fraud Detection Method for Imbalanced Medical Insurance Data

WU Wenlong, ZHOU Xi, WANG Yi, WANG Baoquan

1.Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, China
2.University of Chinese Academy of Sciences, Beijing 100049, China
3.Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi 830011, China

Online:2021-05-01 Published:2021-04-29

摘要/Abstract

摘要：

医保欺诈检测具有迫切的现实意义，当前工作主要以机器学习方法为主，但面临两个重要问题：（1）数据不平衡问题较为突出，欺诈样本占比极小，影响识别效果；（2）数据特征的选取与构造过于依赖领域业务知识，难以保证特征有效性。针对这些问题，提出了一种针对不平衡医保数据的欺诈检测方法——WKAG。使用WGAN-KDE（Wasserstein Generative Adversarial Network-Kernel Density Estimation）方法改善数据不平衡问题，结合自编码器（Auto-Encoder）提取数据的深层隐藏特征，使用Gradient Boosted Decision Tree（GBDT）检测医保欺诈行为。在多个公开数据集上验证了该方法有效性，并在真实医保业务数据集上进行了实验验证，结果表明了WKAG可作为医保欺诈行为的有效检测方法。

关键词: 生成对抗网络, 不平衡类, 自编码特征表示, 医保欺诈检测, 集成学习

Abstract:

Medical insurance fraud detection has urgent practical significance. The current work is mainly concentrated on machine learning methods and confronted with two important issues：（1）The problem of imbalanced data is prominent and the proportion of fraud data among medical insurance data is extremely small, which affects the identification effect; （2）The selection and construction of data features depend on domain business knowledge, and it is difficult to guarantee the validity of features. Aiming at these problems, this paper proposes a fraud detection method for imbalanced healthcare data—WKAG：The Wasserstein Generative Adversarial Network-Kernel Density Estimation（WGAN-KDE） method is used to improve the imbalance of medical insurance data. The Auto-Encoder is used to extract the deep hidden features of data. The Gradient Boosted Decision Tree（GBDT） is used to detect medical insurance fraud. The validity of the method has been verified on multiplepublic data sets as well as the real medical insurance business data set. The results show that WKAG can be used as an effective detection method for medical insurance fraud.

Key words: generative adversarial network, imbalance dataset, auto-encoder feature representation, medical insurance fraud detection, ensemble learning

吴文龙，周喜，王轶，王保全. WKAG：一种针对不平衡医保数据的欺诈检测方法[J]. 计算机工程与应用, 2021, 57(9): 247-254.

WU Wenlong, ZHOU Xi, WANG Yi, WANG Baoquan. WKAG：Fraud Detection Method for Imbalanced Medical Insurance Data[J]. Computer Engineering and Applications, 2021, 57(9): 247-254.

[1]	张波，徐黎明，黄志伟，要小鹏. 梯度策略的多目标GANs帕累托最优解算法[J]. 计算机工程与应用, 2021, 57(9): 89-95.
[2]	柴旭，方明，付飞蚺，邵桢. 考场环境下考生视线估计方法[J]. 计算机工程与应用, 2021, 57(9): 199-206.
[3]	王晋宇，杨海涛，李高源，张长弓，冯博迪. 生成对抗网络及其图像处理应用研究进展[J]. 计算机工程与应用, 2021, 57(8): 26-35.
[4]	李莉，纪欣沅，宋嵩. 回环软件缺陷数量预测模型[J]. 计算机工程与应用, 2021, 57(7): 158-163.
[5]	万梦翔，姚寒冰. 面向恶意网页训练数据生成的GAN模型[J]. 计算机工程与应用, 2021, 57(6): 124-130.
[6]	张睿，吴伯雄，张丽园，张博. 复杂场景下行人轨迹预测方法[J]. 计算机工程与应用, 2021, 57(6): 138-143.
[7]	邹承明，胡佑璞. 引入生成对抗网络的室外场景单目深度估计[J]. 计算机工程与应用, 2021, 57(6): 176-183.
[8]	陈人和，赖振意，钱育蓉. 改进的生成对抗网络图像去噪算法[J]. 计算机工程与应用, 2021, 57(5): 168-172.
[9]	周华强，曹林，杜康宁. 多判别器循环生成对抗网络的素描人脸合成[J]. 计算机工程与应用, 2021, 57(3): 231-238.
[10]	夏皓，吕宏峰，罗军，蔡念. 图像超分辨率深度学习研究及应用进展[J]. 计算机工程与应用, 2021, 57(24): 51-60.
[11]	杨鲁月，张树美，赵俊莉. 基于并行Gan的有遮挡动态表情识别[J]. 计算机工程与应用, 2021, 57(24): 168-178.
[12]	曹玉东，刘海燕，贾旭，李晓会. 基于深度学习的图像质量评价方法综述[J]. 计算机工程与应用, 2021, 57(23): 27-36.
[13]	王琴，刘盾. 结合集成学习的序贯三支情感分类方法研究[J]. 计算机工程与应用, 2021, 57(23): 211-218.
[14]	王海涌，李海洋，高雪娇. 基于结构嵌入的图像修复方法研究[J]. 计算机工程与应用, 2021, 57(22): 241-246.
[15]	魏富强，古兰拜尔·吐尔洪，买日旦·吾守尔. 生成对抗网络及其应用研究综述[J]. 计算机工程与应用, 2021, 57(19): 18-31.

WKAG：一种针对不平衡医保数据的欺诈检测方法

WKAG：Fraud Detection Method for Imbalanced Medical Insurance Data

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics