基于GCN的虚假评论检测方法

doi:10.3778/j.issn.1002-8331.2008-0125

摘要/Abstract

摘要： 服务类网站的用户评价是消费者选择的重要参考，受商业利益的驱使，点评网站上充斥着大量不符合产品真实特性的评论，虚假评论的检测与治理，对于监督网站运营，净化网络环境具有重要的意义。为了提升虚假评论的检测结果，在基于词和文档构建的图神经网络进行文本分类的基础上，提出基于融合语义相似度的图卷积网络（sematic-graph convolution networks）的虚假评论检测方法。基于PMI（pointwise mutual information）指数以及基于词嵌入度量的语义相似度构建词与词之间的连边，基于TF-IDF特征值构建词与评论之间的连边；利用图神经网络的传递特征对上述构建的词汇-评论异质文本图中的节点特征信息进行聚合和抽取，捕获词与评论节点之间的高阶特征信息实现分类。在公开数据集上，相对于CNN、LSTM及Text-GCN，提出方法的准确率分别提升7%、4.8%和1.3%。

关键词: 图卷积网络（GCN）, 虚假评论, 语义相似度, 异质文本图

Abstract: User evaluation of service websites is an important reference for consumers to choose, driven by commercial interests, review websites are filled with a large number of reviews that do not conform to the true characteristics of the product, the detection and management of fake reviews is of great significance for monitoring website operations and purifying the network environment. In order to improve the detection results of fake reviews, basing on the text classification based on graph neural network constructed by words and documents, this paper proposes a fake review detection method based on sematic-graph convolution networks（Sem-GCN）. It constructs the connection between words and words based on the PMI（pointwise mutual information） index and the semantic similarity based on the word embedding measurement, and constructs the connection between words and comments based on the TF-IDF feature value, and then uses transfer characteristics of graph neural networks to aggregate and extracts the node feature information in the vocabulary-review heterogeneous text graph constructed above, and captures the high-level feature information between the word and the review node to achieve classification. On the public dataset, compared with CNN, LSTM and Text-GCN, the accuracy of this method is increased by 7%, 4.8% and 1.3% respectively.

Key words: graph convolution networks（GCN）, fake reviews, semantic similarity, heterogeneous text map

曹东伟, 李邵梅, 陈鸿昶. 基于GCN的虚假评论检测方法[J]. 计算机工程与应用, 2022, 58(3): 181-186.

CAO Dongwei, LI Shaomei, CHEN Hongchang. Fake Reviews Detection Method Based on GCN[J]. Computer Engineering and Applications, 2022, 58(3): 181-186.

参考文献

[1] JINDAL N，LIU B.Opinion spam and analysis[C]//Proceedings of the 2008 International Conference on Web Search and Data Mining，2008：219-230.
[2] KIPF T N，WELLING M.Semi-supervised classification with graph convolutional networks[J].arXiv：1609.02907，2016.
[3] LIU X，YOU X，ZHANG X，et al.Tensor graph convolutional networks for text classification[J].arXiv：2001. 05313，2020.
[4] WANG D，LIN J，CUI P，et al.A semi-supervised graph attentive network for financial fraud detection[C]//2019 IEEE International Conference on Data Mining（ICDM），2020.
[5] LI A，QIN Z，LIU R，et al.Spam review detection with graph convolutional networks[C]//Proceedings of the 28th ACM International Conference on Information and Knowledge Management，2019：2703-2711.
[6] WANG G，XIE S，LIU B，et al.Review graph based online store review spammer detection[C]//2011 IEEE 11th International Conference on Data Mining，2011：1242-1247.
[7] YILMAZ C M，DURAHIM A O.SPR2EP：a semi-supervised spam review detection framework[C]//2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining（ASONAM），2018：306-313.
[8] WANG J，WEN R，WU C，et al.Fdgars：fraudster detection via graph convolutional networks in online app review system[C]//Companion Proceedings of The 2019 World Wide Web Conference，2019：310-316.
[9] MANASKASEMSAK B，CHANMAKHO C，KLAINONGSUANG J，et al.Opinion spam detection through user behavioral graph partitioning approach[C]//Proceedings of the 2019 3rd International Conference on Intelligent Systems，Metaheuristics & Swarm Intelligence，2019：73-77.
[10] YAO L，MAO C，LUO Y.Graph convolutional networks for text classification[C]//Proceedings of the AAAI Conference on Artificial Intelligence，2019：7370-7377.
[11] HAI Z，ZHAO P，CHENG P，et al.Deceptive review spam detection via exploiting task relatedness and unlabeled data[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing，2016：1817-1826.
[12] CAI H，ZHENG V W，CHANG K.A comprehensive survey of graph embedding：problems，techniques and applications[J].IEEE Transactions on Knowledge and Data Engineering，2018，30（9）：1616-1637.
[13] 徐冰冰，岑科廷，黄俊杰，等.图卷积神经网络综述[J].计算机学报，2020，43（5）：755-780.
XU B B，QIN K T，HUANG J J，et al.A survey on graph convolutional neural networks[J].Chinese Journal of Computers，2020，43（5）：755-780.
[14] VELI?KOVI? P，CUCURULL G，CASANOVA A，et al.Graph attention networks[J].arXiv：1710.10903，2017.
[15] 曲强，于洪涛，黄瑞阳.基于图卷积网络的社交网络Spammer检测技术[J].网络与信息安全学报，2018，4（5）：39-46.
QU Q，YU H T，HUANG R Y.Spammer detection technology of social network based on graph convolutional network[J].Chinese Journal of Network and Information Security，2018，4（5）：39-46.
[16] MARCHEGGIANI D，TITOV I.Encoding sentences with graph convolutional networks for semantic rolelabeling[J].arXiv：1703.04826，2017.
[17] 任亚峰，尹兰，姬东鸿.基于语言结构和情感极性的虚假评论识别[J].计算机科学与探索，2014，8（3）：313-320.
REN Y F，YIN L，JI D H.Deceptive reviews detection based on language structure and sentiment polarity[J]. Journal of Frontiers of Computer Science and Technology，2014，8（3）：313-320.