计算机工程与应用 ›› 2022, Vol. 58 ›› Issue (1): 158-164.DOI: 10.3778/j.issn.1002-8331.2007-0186

• 模式识别与人工智能 • 上一篇    下一篇

抑郁症网络社交与疑似抑郁微博初步筛选算法

查国清,胡超然,孙铭涛,王德庆   

  1. 1.北京航空航天大学 可靠性与系统工程学院,北京 100191
    2.波士顿大学 文理学院,马萨诸塞州 02212
    3.北京航空航天大学 经济管理学院,北京 100191
    4.北京航空航天大学 计算机学院,北京 100191
  • 出版日期:2022-01-01 发布日期:2022-01-06

Depression Group’s Internet Social Interactionand Preliminary Screening Algorithm for Weibo with Suspected Depression

ZHA Guoqing, HU Chaoran, SUN Mingtao, WANG Deqing   

  1. 1.School of Reliability and Systems Engineering, Beihang University, Beijing 100191, China
    2.Graduate School of Art & Science, Boston University, Massachusetts 02212, USA
    3.School of Economics and Management, Beihang University, Beijing 100191, China
    4.School of Computer Science & Engineering, , Beihang University, Beijing 100191, China
  • Online:2022-01-01 Published:2022-01-06

摘要: 在社交网络数据与抑郁症有关研究中往往需要采取人工方式标注抑郁症和非抑郁症用户,费时费力。通过高校大学生的微博社交数据的采集与分析,研究并提出了一种基于抑郁关键词与语义扩展的大学生疑似抑郁微博初步筛选算法——综合词法。该方法通过基础关键词表的构建和基于词嵌入学习模型WORD2VEC的语义扩展形成抑郁关键词表,最后利用该词表对被测微博进行语义相似度计算,进而识别其是否为疑似抑郁微博。在首都高校大学生微博数据集上的实验结果表明:综合词法在筛选准确率上优于SDS问卷分词法和专家词法;综合词法能够快速地从海量大学生微博中自动筛选占比非常少的疑似抑郁微博,减少专家标注工作量,提高标注效率,并可进一步为后续抑郁症患者精确识别(分类问题)提供良好的数据处理基础。

关键词: 抑郁症, 社交媒体, 话题模型, 社交行为分析, 微博识别

Abstract: In the research of social network data and depression, it is often necessary to label depression and non depression users manually, which is time-consuming and laborious. Through the collection and analysis of college students’ Weibo social data, this paper studies and proposes a suspicious depression Weibo preliminary screening algorithm based on depression keywords and semantic expansion—the comprehensive depression keyword method. The method forms a comprehensive depression keyword table based on the construction of the basic keyword table and the semantic expansion based on the word embedded learning model WORD2VEC. Finally, the vocabulary is used to calculate the semantic similarity of the measured Weibo, and then uses the similarity to determine whether the Weibo is a suspicious Weibo. The experimental results on the Weibo dataset of college students in the capital show that the comprehensive depression keyword method is superior to the SDS questionnaire segmentation method and the expert keyword method in recognition accuracy. The comprehensive depression keyword method can quickly screen the suspected depression Weibo which accounts for a very small proportion from a large number of college students’ Weibo. The method reduces the workload of expert tagging and improves the tagging efficiency, and further provides a good data processing foundation for the accurate identification (classification problem) of subsequent patients with depression.

Key words: depression, social media, topic model, social behavior analysis, Weibo recognition