计算机工程与应用 ›› 2022, Vol. 58 ›› Issue (11): 302-312.DOI: 10.3778/j.issn.1002-8331.2010-0238

• 工程与应用 • 上一篇    

银行客户分类的数据特征选择方法与实证研究

段刚龙,王妍,马鑫,杨泽阳   

  1. 西安理工大学 经济与管理学院,西安 710054
  • 出版日期:2022-06-01 发布日期:2022-06-01

Data Feature Selection Method and Empirical Study of Bank Customer Segmentation

DUAN Ganglong, WANG Yan, MA Xin, YANG Zeyang   

  1. School of Economics and Management, Xi’an University of Technology, Xi’an 710054, China
  • Online:2022-06-01 Published:2022-06-01

摘要: 针对银行客户数据维度高、量级大和冗余特征多等问题,提出了一种借鉴多模态融合思想的综合特征筛选方法,通过计算并比较数据集中各特征的综合贡献度来对冗余特征进行筛选。基于真实银行客户数据特点,给出了一种包括类型转换及离散化、缺失值填充和标准化三部分的数据预处理方案,并对真实银行客户数据进行预处理;利用Pearson相关系数、随机森林、量化先验认知以及提出的多模态视角下的综合特征筛选方法对预处理后数据集中的冗余特征进行筛选,并分别提取到14个、8个、15个和11个特征;根据实验研究结果,从定性与定量两个层面对四种特征选择方法的特征选择效果进行充分比较。实验结果表明,提出的一种借鉴多模态融合思想的综合特征筛选方法能够有效弥补不同特征选择方法间的缺陷,降低数据维度,进而提升银行客户分类模型性能。

关键词: 客户细分, 特征选择, 知识挖掘, 量化先验认知, 多模态

Abstract: Aiming at the problems of high dimension, large scale and many redundant features of bank customer data, this paper proposes a comprehensive feature selection method based on multi-modal fusion, which can select redundant features by calculating and comparing the comprehensive contribution of each feature in the data set. Firstly, based on the characteristics of real bank customer data, this paper presents a data preprocessing scheme including type conversion and discretization, missing value filling and standardization, and preprocesses the real bank customer data. Secondly, it uses Pearson correlation coefficient, random forest, quantitative prior cognition and the multimodal comprehensive feature selection method proposed in this paper to filter the redundant features in the preprocessed dataset, and 14, 8, 15 and 11 features are extracted respectively. Finally, according to the experimental results, the feature selection effects of the four feature selection methods are fully compared from the qualitative and quantitative levels. The experimental results show that a comprehensive feature selection method based on multimodal fusion can effectively make up for the defects of different feature selection methods, reduce the data dimension, and improve the performance of bank customer classification model.

Key words: customer segmentation, feature selection, knowledge mining, quantitative prior knowledge, multi-modal