计算机工程与应用 ›› 2022, Vol. 58 ›› Issue (3): 181-186.DOI: 10.3778/j.issn.1002-8331.2008-0125

• 模式识别与人工智能 • 上一篇    下一篇

基于GCN的虚假评论检测方法

曹东伟,李邵梅,陈鸿昶   

  1. 1.郑州大学 中原网络安全研究院,郑州 450000
    2.中国人民解放军战略支援部队信息工程大学,郑州 450000
  • 出版日期:2022-02-01 发布日期:2022-01-28

Fake Reviews Detection Method Based on GCN

CAO Dongwei, LI Shaomei, CHEN Hongchang   

  1. 1.Zhongyuan Network Security Research Institute, Zhengzhou University, Zhengzhou 450000, China
    2.People’s Liberation Army Strategic Support Force Information Engineering University, Zhengzhou 450000, China
  • Online:2022-02-01 Published:2022-01-28

摘要: 服务类网站的用户评价是消费者选择的重要参考,受商业利益的驱使,点评网站上充斥着大量不符合产品真实特性的评论,虚假评论的检测与治理,对于监督网站运营,净化网络环境具有重要的意义。为了提升虚假评论的检测结果,在基于词和文档构建的图神经网络进行文本分类的基础上,提出基于融合语义相似度的图卷积网络(sematic-graph convolution networks)的虚假评论检测方法。基于PMI(pointwise mutual information)指数以及基于词嵌入度量的语义相似度构建词与词之间的连边,基于TF-IDF特征值构建词与评论之间的连边;利用图神经网络的传递特征对上述构建的词汇-评论异质文本图中的节点特征信息进行聚合和抽取,捕获词与评论节点之间的高阶特征信息实现分类。在公开数据集上,相对于CNN、LSTM及Text-GCN,提出方法的准确率分别提升7%、4.8%和1.3%。

关键词: 图卷积网络(GCN), 虚假评论, 语义相似度, 异质文本图

Abstract: User evaluation of service websites is an important reference for consumers to choose, driven by commercial interests, review websites are filled with a large number of reviews that do not conform to the true characteristics of the product, the detection and management of fake reviews is of great significance for monitoring website operations and purifying the network environment. In order to improve the detection results of fake reviews, basing on the text classification based on graph neural network constructed by words and documents, this paper proposes a fake review detection method based on sematic-graph convolution networks(Sem-GCN). It constructs the connection between words and words based on the PMI(pointwise mutual information) index and the semantic similarity based on the word embedding measurement, and constructs the connection between words and comments based on the TF-IDF feature value, and then uses transfer characteristics of graph neural networks to aggregate and extracts the node feature information in the vocabulary-review heterogeneous text graph constructed above, and captures the high-level feature information between the word and the review node to achieve classification. On the public dataset, compared with CNN, LSTM and Text-GCN, the accuracy of this method is increased by 7%, 4.8% and 1.3% respectively.

Key words: graph convolution networks(GCN), fake reviews, semantic similarity, heterogeneous text map