计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (21): 1-25.DOI: 10.3778/j.issn.1002-8331.2302-0129

• 热点与综述 • 上一篇    下一篇

基于机器学习的信用卡交易欺诈检测研究综述

蒋洪迅,江俊毅,梁循   

  1. 中国人民大学 信息学院,北京 100872
  • 出版日期:2023-11-01 发布日期:2023-11-01

Survey on Credit Card Transaction Fraud Detection Based on Machine Learning

JIANG Hongxun, JIANG Junyi, LIANG Xun   

  1. School of Information, Renmin University of China, Beijing 100872, China
  • Online:2023-11-01 Published:2023-11-01

摘要: 机器学习在信用卡交易检测中有其特殊性,面对的环境更为复杂。由于有人的智力介入,战胜信用卡交易欺诈,其挑战性比人脸识别、无人驾驶等工程问题的难度更高,照搬工程学科的机器学习方法往往会失败。综述了2000年以来基于机器学习的信用卡欺诈检测研究历程,辨析了该领域的研究范畴、应用场景、技术流派等相关概念及其联系;解构了机器学习欺诈识别的一般性研究架构,从特征工程、模型算法、评价指标三个环节归纳总结了领域内研究的最新进展;从数据集是否具备标签角度,着重列举了面向欺诈识别的有监督的、无监督和半监督三类主流机器学习模型,讨论了这些模型的出发点、核心思想、求解方法以及优缺点;还分析了强化学习模型模拟欺诈者与机构之间的动态博弈过程;探讨了机器学习面临的海量数据、样本偏斜和概念漂移三大难点问题,并汇集整理了缓解这些问题的最新进展;总结了面向欺诈检测的机器学习研究目前存在的局限、争议和挑战,并为未来的研究方向提供趋势分析与建议。

关键词: 信用卡欺诈识别, 机器学习, 数据挖掘, 样本偏斜, 概念漂移

Abstract: Machine learning has its distinctiveness in credit card transaction detection and faces a more complex environment. Since the intervention of human intelligence, machine learning encounters harder challenges in fraud detection than the ones of face recognition and driverlessness, which leads to failures if only applying the processes of engineering disciplines. This paper depicts the 2000-since research history of credit card anti-fraud; identifies the definition, scope, technical streams, applications, and other key concepts, and their interconnections in the field of detection oriented machine learning; analyzes the general architecture of fraud detection and summarizes the state-of-the-art of transaction fraud detection research in terms of feature engineering, models/algorithms, and evaluation metrics; discusses various detection algorithms of credit card transaction fraud and enumerates their original intention, core ideas, solution methods, advantages or disadvantages, and relevant extensions; highlights unsupervised, supervised, and semi-supervised learning models of fraud recognition, as well as various ensembles such as models cascading and aggregation; addresses three major challenges, i.e., massive data, sample skew, and concept drift, and compiles the latest progresses to alleviate these problems. This paper concludes with the limitations, controversies, and challenges of machine learning on credit card fraud recognition, and provides the trend analysis and suggestions for future research directions.

Key words: credit fraud detection, machine learning, data mining, sample skew, concept drift