Research on User Portrait of Improved Word Vector Model

doi:10.3778/j.issn.1002-8331.1906-0192

Computer Engineering and Applications ›› 2020, Vol. 56 ›› Issue (1): 180-184.DOI: 10.3778/j.issn.1002-8331.1906-0192

Previous Articles Next Articles

Research on User Portrait of Improved Word Vector Model

CHEN Zeyu, HUANG Bo

1.School of Electrical and Electronic Engineering, Shanghai University of Engineering and Technology, Shanghai 201620, China
2.Jiangxi Collaborative Innovation Center for Economic Crime Detection and Prevention and Control, Nanchang 330000, China

Online:2020-01-01 Published:2020-01-02

改进词向量模型的用户画像研究

陈泽宇，黄勃

1.上海工程技术大学电子电气工程学院，上海 201620
2.江西省经济犯罪侦查与防控技术协同创新中心，南昌 330000

Abstract

Abstract: User portrait technology can bring great commercial value to enterprises. For the user’s historical query words, the word vector can be used to obtain the expression of the query word at the semantic level, but the word vector model that generates the word vector for the same word is the same, which makes the model unable to deal with the polysemy of a word. Therefore, this paper uses the LDA topic model to assign topics to each query word, so that the query word and its topic are put together in the neural network model to learn the topical word vector. Finally, the random forest classification algorithm is used to classify the basic attributes of users and build the user portrait. The experimental results show that the classification accuracy of this model is higher than that of the word vector model.

Key words: user portrait, word vector, LDA topic model, random forest

摘要： 用户画像技术可以给企业带来巨大的商业价值。针对用户的历史查询词，利用词向量可以得到查询词在语义层次上的表达，但词向量模型对于同一个单词生成的词向量是相同的，使得该模型无法很好的处理一词多义的情况。因此，使用LDA主题模型为每个查询词分配主题，使查询词和其主题共同放入神经网络模型中学习得到其主题词向量，最后采用随机森林分类算法对用户基本属性进行分类构建用户画像。实验结果表明，该模型的分类精度要高于词向量模型。

关键词: 用户画像, 词向量, LDA主题模型, 随机森林

CHEN Zeyu, HUANG Bo. Research on User Portrait of Improved Word Vector Model[J]. Computer Engineering and Applications, 2020, 56(1): 180-184.

陈泽宇，黄勃. 改进词向量模型的用户画像研究[J]. 计算机工程与应用, 2020, 56(1): 180-184.

[1]	YANG Yemin, ZHANG Huijun, ZHANG Xiaolong. Research on Interpretable Visual Analysis Method of Random Forest [J]. Computer Engineering and Applications, 2021, 57(6): 168-175.
[2]	XIONG Jian, QIN Renchao, HE Mengyi, LIU Jianlan, TANG Fengyang. Application of Improved Random Forest Algorithm in Android Malware Detection [J]. Computer Engineering and Applications, 2021, 57(3): 130-136.
[3]	CHENG Yuhang, ZHANG Jianqin, LI Jiangchuan, ZHANG An. Visual Mining and Analysis Method of Text Data in Traffic Accident [J]. Computer Engineering and Applications, 2021, 57(21): 116-122.
[4]	AN Lei, HAN Zhonghua, LIN Shuo, SHANG Wenli. Research on GAN-SDAE-RF Model for Network Intrusion Detection [J]. Computer Engineering and Applications, 2021, 57(21): 155-164.
[5]	YAO Guibin, ZHANG Qigui. Chinese Named Entity Recognition Based on XLnet Language Model [J]. Computer Engineering and Applications, 2021, 57(18): 156-162.
[6]	WU Weijie, ZHANG Jingxiang. Random Forest Feature Selection Algorithm Based on Categorization Information and Application [J]. Computer Engineering and Applications, 2021, 57(17): 147-156.
[7]	YAN Zhengxu, QIN Chao, SONG Gang. Random Forest Model Stock Price Prediction Based on Pearson Feature Selection [J]. Computer Engineering and Applications, 2021, 57(15): 286-296.
[8]	CAO Junbo，YE Xia，XU Feixiang，YIN Liedong. Improved CBOW Emotional Information Acquisition Research [J]. Computer Engineering and Applications, 2020, 56(9): 142-147.
[9]	ZHU Di, CHEN Danwei. Technology of Mobile Application Identification Based on Density-Based Clustering and Random Forest [J]. Computer Engineering and Applications, 2020, 56(4): 63-68.
[10]	HU Qingyu, LIU Guangchen. Application of Deep Belief Network in Recognition of Protein Coding Regions [J]. Computer Engineering and Applications, 2020, 56(4): 247-255.
[11]	LI Ling, GU Xiaomei, LIU Zihao. Application Research of Multi-subdomain Random Forest in Context-Aware Recommendation [J]. Computer Engineering and Applications, 2020, 56(22): 132-141.
[12]	LUO Jigen, DU Jianqiang, NIE Bin, LI Huan, NIE Jianhua, CHEN Yufeng. Random Forest Optimization Method Based on Cluster Undersampling Strategy [J]. Computer Engineering and Applications, 2020, 56(22): 166-172.
[13]	LI Jieqi, HU Liangbing. Review of Machine Learning for Predictive Maintenance [J]. Computer Engineering and Applications, 2020, 56(21): 11-19.
[14]	CHENG Zhenjing, CHENG Yaodong, CHEN Gang, WANG Lu, LI Haibo, HU Qingbao. High Energy Physics Data Placement Strategy Based on Random Forest [J]. Computer Engineering and Applications, 2020, 56(21): 60-64.
[15]	WU Shili, TANG Zhenmin, LIU Yong. Fatigue Driving Recognition Algorithm Using Random Forest with Multi-feature Fusion [J]. Computer Engineering and Applications, 2020, 56(20): 212-219.

Research on User Portrait of Improved Word Vector Model

改进词向量模型的用户画像研究

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics