Computer Engineering and Applications ›› 2019, Vol. 55 ›› Issue (17): 246-251.DOI: 10.3778/j.issn.1002-8331.1805-0345

Previous Articles     Next Articles

Click Fraud Detection Method Based on Ensemble Feature Selection

GUO Han, SHUAI Renjun, ZHANG Xin, LI Xin   

  1. College of Computer Science and Technology, Nanjing Tech University, Nanjing 211816, China
  • Online:2019-09-01 Published:2019-08-30

基于集成特征选择的点击欺诈检测方法

郭汉,帅仁俊,张欣,李鑫   

  1. 南京工业大学 计算机科学与技术学院,南京 211816

Abstract: Click fraud in the online advertising network has affected the stable development of online advertising seriously. This paper proposes an online advertisement click fraud detection ensemble method to solve the problems that redundant features will reduce the training efficiency and imbalanced data will cause the decision boundary to be biased. Firstly, the Bagging ensemble method and Synthetic Minority Oversampling Technique(SMOTE) are used to put as many positive samples as possible into the dataset to reduce the influences of too many negative samples. Then, the relevance metrics feature selection algorithm is used to filter out important features and remove redundant features. At last, a random forest algorithm is used to build a click fraud detection model. This method can identify fraud publishers effectively, which meets the requirements for click fraud detection in online advertising.

Key words: click fraud, imbalance, ensemble feature selection, Bagging, random forest

摘要: 网络在线广告中以套取广告费为目的的点击欺诈已经严重影响了网络广告的稳定发展。从FDMA2012竞赛提供的欺诈发布商检测的真实数据集出发,针对冗余特征会降低训练效率以及不平衡数据会使决策边界发生偏倚的问题,提出了一种基于集成特征选择的网络在线广告点击欺诈检测方法。采用Bagging方法和合成少数类过采样技术(Synthetic Minority Oversampling Technique,SMOTE)相结合的方法将多数的正常点击广告发布商样本与少数的欺诈点击广告发布商样本构造为多个袋装子集,利用基于相关性度量的特征选择算法对每个袋装子集中筛选出特征子集,设置阈值得到特征合集,利用随机森林算法构建点击欺诈检测模型。实验结果表明该方法能够有效识别出实施欺诈点击行为的非法发布商,达到网络在线广告中点击欺诈检测的要求。

关键词: 点击欺诈, 不平衡, 集成特征选择, Bagging, 随机森林