计算机工程与应用 ›› 2019, Vol. 55 ›› Issue (24): 214-221.DOI: 10.3778/j.issn.1002-8331.1808-0420

• 工程与应用 • 上一篇    下一篇

采用混合模型的电信领域用户流失预测

汪明达,周俏丽,蔡东风   

  1. 沈阳航空航天大学 人机智能研究中心,沈阳 110136
  • 出版日期:2019-12-15 发布日期:2019-12-11

User Churn Prediction in Telecom Domain Using Hybrid Model

WANG Mingda, ZHOU Qiaoli, CAI Dongfeng   

  1. Research Center for Human-Computer Intelligence, Shenyang Aerospace University, Shenyang 110136, China
  • Online:2019-12-15 Published:2019-12-11

摘要: 用户流失预测能够帮助公司减少客户的流失,对公司的营收和提高竞争力有重要意义。然而,由于电信领域数据的稀疏性和不平衡等问题,国内外对于电信领域的用户流失预测大多处于研究阶段,还没有真正应用到实际生产当中。提出了利用神经网络、机器学习与朴素随机过采样、投票相结合的混合模型来预测电信领域的流失用户。数据集使用的是KDD Cup 2009年比赛数据,该数据由法国电信运行商Orange公司提供。在十折交叉验证下,AdaBoost和Gradient Boosting一次投票分类后AUC值能够达到0.677 1,利用其他模型对混合模型预测出的流失用户清单进行二次投票分类,前200名高危流失用户的预测准确率能够达到31.8%。实验结果表明,朴素随机过采样和投票相结合有效提升了模型的准确性。

关键词: 神经网络, 机器学习, 朴素随机过采样, 投票分类

Abstract: The user churn forecast can help the company to reduce the customer loss, has the important significance to the company’s revenue and enhances the competition. However, due to the sparsity and imbalance of telecom field data, most of the users’ loss prediction at home and abroad is in the research stage, and it has not been applied to actual production. In this paper, a hybrid model based on neural network, machine learning and Naive randomoversampler and voting is proposed to predict the lost users in telecommunication field. The dataset uses the KDD Cup 2009 competition data, which is provided by the French telecom operator Orange company. Under 10 cross-validation, AdaBoost and Gradient Boosting, the AUC value can reach 0.677 1 after one vote classification, and other models are used to classify the lost user list of the mixed model to two times, the prediction accuracy of the first 200 high-risk loss users can reach 31.8%. The experimental results show that the accuracy of the model is improved by the combination of Naive randomoversampler and voting.

Key words: neural network, machine learning, Naive randomoversampler, vote classification