计算机工程与应用 ›› 2020, Vol. 56 ›› Issue (1): 180-184.DOI: 10.3778/j.issn.1002-8331.1906-0192

• 模式识别与人工智能 • 上一篇    下一篇

改进词向量模型的用户画像研究

陈泽宇,黄勃   

  1. 1.上海工程技术大学 电子电气工程学院,上海 201620
    2.江西省经济犯罪侦查与防控技术协同创新中心,南昌 330000
  • 出版日期:2020-01-01 发布日期:2020-01-02

Research on User Portrait of Improved Word Vector Model

CHEN Zeyu, HUANG Bo   

  1. 1.School of Electrical and Electronic Engineering, Shanghai University of Engineering and Technology, Shanghai 201620, China
    2.Jiangxi Collaborative Innovation Center for Economic Crime Detection and Prevention and Control, Nanchang 330000, China
  • Online:2020-01-01 Published:2020-01-02

摘要: 用户画像技术可以给企业带来巨大的商业价值。针对用户的历史查询词,利用词向量可以得到查询词在语义层次上的表达,但词向量模型对于同一个单词生成的词向量是相同的,使得该模型无法很好的处理一词多义的情况。因此,使用LDA主题模型为每个查询词分配主题,使查询词和其主题共同放入神经网络模型中学习得到其主题词向量,最后采用随机森林分类算法对用户基本属性进行分类构建用户画像。实验结果表明,该模型的分类精度要高于词向量模型。

关键词: 用户画像, 词向量, LDA主题模型, 随机森林

Abstract: User portrait technology can bring great commercial value to enterprises. For the user’s historical query words, the word vector can be used to obtain the expression of the query word at the semantic level, but the word vector model that generates the word vector for the same word is the same, which makes the model unable to deal with the polysemy of a word. Therefore, this paper uses the LDA topic model to assign topics to each query word, so that the query word and its topic are put together in the neural network model to learn the topical word vector. Finally, the random forest classification algorithm is used to classify the basic attributes of users and build the user portrait. The experimental results show that the classification accuracy of this model is higher than that of the word vector model.

Key words: user portrait, word vector, LDA topic model, random forest