Computer Engineering and Applications ›› 2019, Vol. 55 ›› Issue (11): 102-109.DOI: 10.3778/j.issn.1002-8331.1810-0294

Previous Articles     Next Articles

Similar User Mining Based on User Interest Topics in Weibo

LI Pengfei1, DONG Xu1, ZHONG Zhaoman2,3, LI Cunhua2   

  1. 1.School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, Jiangsu 221000, China
    2.School of Computer Engineering, Huaihai Institute of Technology, Lianyungang, Jiangsu 222005, China
    3.Software R & D Center, Jiangsu Jinge Network Technology Co., Ltd., Lianyungang, Jiangsu 222005, China
  • Online:2019-06-01 Published:2019-05-30

基于微博用户兴趣话题的相似用户挖掘

李鹏飞1,董  旭1,仲兆满2,3,李存华2   

  1. 1.中国矿业大学 计算机科学与技术学院,江苏 徐州  221000
    2.淮海工学院 计算机工程学院,江苏 连云港 222005
    3.江苏金鸽网络科技有限公司 软件研发中心,江苏 连云港 222005

Abstract: Similar user mining is an important way to improve the quality of social network services. In the era of big data-oriented social networks, accurate similar user mining has important meanings for users and Internet companies. Similar users mined from users’ interest topics are more consistent with similar users’ requirements. This paper proposes a method for similar user mining based on user interest topics. The method first uses the TextRank topic extraction method to extract the interest topics of the user, and then trains users’ content to calculate the similarity between all the words. Four methods for calculating similarity of user interest topic words, such as CP(Corresponding Position similarity), CPW(Corresponding Position Weighted similarity), AP(All Position similarity), and APW(All Position Weighted similarity), are proposed. The coincidence rate verifies the similar user mining effect. The similar user’s followers/fans coincidence percentage of APW similarity is 1.687%, which is better than the other three algorithms proposed, which are increased by 26.3%, 2.8%, 12.4%. Meanwhile, the coincidence rate of proposed method is better than the traditional text similarity methods and Jaccard similarity, edit distance algorithm, and cosine similarity are improved by 20.4%, 21.2%, and 45.0%, respectively. Therefore, the APW method can more effectively mine similar users of users.

Key words: Weibo, similar users, interest topic, text training, user mining

摘要: 相似用户挖掘是提高社交网络服务质量的重要途径,在面向大数据的社交网络时代,准确的相似用户挖掘对于用户和互联网企业等都有重要的意义,而根据用户自己的兴趣话题挖掘的相似用户更符合相似用户的要求。提出了一种基于用户兴趣话题进行相似用户挖掘的方法。该方法首先使用TextRank话题提取方法对用户进行兴趣话题提取,再对用户发表内容进行训练,计算出所有词之间的相似度。提出CP(Corresponding Position similarity)、CPW(Corresponding Position Weighted similarity)、AP(All Position similarity)、APW(All Position Weighted similarity)四种用户兴趣话题词相似度计算方法,通过用户和相似用户间关注、粉丝重合率验证相似用户挖掘效果,APW similarity的相似用户的关注/粉丝重合百分比为1.687%,优于提出的其他三种算法,分别提高了26.3%、2.8%、12.4%,并且比传统的文本相似度方法Jaccard相似度、编辑距离算法、余弦相似度分别提高了20.4%、21.2%、45.0%。因此APW方法可以更加有效地挖掘出用户的相似用户。

关键词: 微博, 相似用户, 兴趣话题, 文本训练, 用户挖掘