计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (15): 329-342.DOI: 10.3778/j.issn.1002-8331.2404-0133

• 工程与应用 • 上一篇    下一篇

PCA+GWO集成特征选择和模型堆叠的客户流失预测

刘梅,郑立君,段永良,段红秀   

  1. 1.南京传媒学院 传媒技术学院,南京 210000
    2.中国电信智能网络科技公司 UPF及云化网元部,南京 210000
  • 出版日期:2025-08-01 发布日期:2025-07-31

Customer Churn Prediction Method with PCA+GWO Integrated Feature Selection and Model Stacking

LIU Mei, ZHENG Lijun, DUAN Yongliang, DUAN Hongxiu   

  1. 1.School of Media Technology, Communication University of China, Nanjing, Nanjing 210000, China
    2.Department of UPF and Cloudified Network Element, China Telecom Intelligent Network Technology Company, Nanjing 210000, China
  • Online:2025-08-01 Published:2025-07-31

摘要: 客户的长期稳定对酒店营收和提高竞争力具有重要意义。在客户流失预测研究中,生产环境采集的数据存在数据量大、维度高、噪点多等问题,导致机器模型的准确率、稳定性和泛化能力下降。针对此类问题,设计了基于PCA+GWO的集成特征选择方法,并用模型堆叠构建了客户流失预测模型。提出了利用Pearson系数和随机森林(RF)的特征重要性来确定需要降维特征组的方法。改进了灰狼优化算法(GWO)中的灰狼位置更新机制和收敛条件,并将其应用于选择最佳特征子集的过程中。选取了10种不同的机器学习模型进行训练,挑选出F1-score表现最优的模型作为基模型,进行元模型训练。实验结果表明,使用某酒店客户信息数据集时,改进后的GWO算法收敛速度显著提升,且预测模型的F1-score达到了97.9%,该模型具有较强的泛化能力。

关键词: 特征选择, 随机森林(RF), 主成分分析(PCA), 灰狼优化(GWO)算法, 模型堆叠

Abstract: The long-term customer stability is important for hotel revenue and increased competitiveness. In customer churn prediction research, data collected in the production environment has issues such as large volume, high dimensionality, and a lot of noise, resulting in decreased accuracy, stability, and generalization ability of machine learning models. Aiming at this type of problem, a feature selection method based on PCA and grey wolf optimizer (GWO) is designed, and a customer churn prediction model is established using model stacking technique. A method utilizing Pearson correlation coefficient and the feature importance of random forest to determine the feature subset for dimensionality reduction is proposed. In accordance with the characteristics of the integrated feature selection model, improvements are made to the gray wolf optimization algorithm (GWO) by enhancing the gray wolf position update mechanism and convergence criteria, which are employed in the process of selecting the optimal feature subset as specified in the paper. Ten different machine learning models are selected for training, and the model with the best F1-score performance is chosen as the base model. Meta-model training is conducted. Experimental results indicate that when using a certain hotel customer information dataset, the convergence speed of the improved GWO algorithm is significantly increased, and the F1-score of the prediction model reaches 97.9%. The model has strong generalization capabilities.

Key words: feature selection, random forest (RF), principal component analysis(PCA), grey wolf optimization (GWO) algorithm, model stacking