计算机工程与应用 ›› 2020, Vol. 56 ›› Issue (14): 118-122.DOI: 10.3778/j.issn.1002-8331.1904-0395

• 模式识别与人工智能 • 上一篇    下一篇

集成FM的短视频喜好率预测模型

王丽苗,许青林,姜文超,符基高   

  1. 广东工业大学 计算机学院,广州 510006
  • 出版日期:2020-07-15 发布日期:2020-07-14

Short Video Preference Rate Prediction Model with Integrated FM

WANG Limiao, XU Qinglin, JIANG Wenchao, FU Jigao   

  1. College of Computer, Guangdong University of Technology, Guangzhou 510006, China
  • Online:2020-07-15 Published:2020-07-14

摘要:

短视频喜好率预测往往面临着用户及广告的数量巨大且训练数据集高维、稀疏等问题,从而导致预测准确度下降。针对这些问题提出了基于LDA-GBDT-FM的短视频喜好率预测模型,该模型利用隐狄利克雷分配模型(LDA)对原始数据集基于主题分割,利用梯度提升决策树(GBDT)对不同主题的子训练集提取连续型特征的高影响力特征,将其与离散特征合并来训练因子分解机(FM)模型,最后有效组合子模型,进而预测短视频的喜好率。实验基于Bytedance公司的数据集,实验结果表明,提出的LDA-GBDT-FM模型相较于LDA-FM、FM和LR在预测指标上分别提高了3.0%、5.7%和8.5%。

关键词: 短视频广告, 喜好率预测, 主题模型, 梯度提升决策树, 因子分解机

Abstract:

Short video preference rate predictions often face a large number of users and advertisements, and the training data set is highly dimensional and sparse, which leads to a decrease in prediction accuracy. Aiming at these problems, a short video preference prediction model based on LDA-GBDT-FM is proposed. The model first uses the Latent Dirichlet Allocation model(LDA) to segment the original dataset based on the topic, and then uses the Gradient Boosting Decision Tree(GBDT) pair. The sub-training sets of different topics extract the high-impact features of continuous features, combine them with discrete features to train the Factorization Machine(FM) model, and finally effectively combine the sub-models to predict the preference rate of short videos. The experiment is based on the dataset of the Bytedance company. The experimental results show that the proposed LDA-GBDT-FM model is 3.0%, 5.7%, and 8.5% higher than the LDA-FM, FM, and LR, respectively.

Key words: short video advertisement, preference rate prediction, topic model, gradient promotion decision tree, factori-
zation machine