Computer Engineering and Applications ›› 2022, Vol. 58 ›› Issue (12): 94-101.DOI: 10.3778/j.issn.1002-8331.2012-0554

• Network, Communication and Security • Previous Articles     Next Articles

Representative User Sampling Algorithm Based on Weighted Neighborhood

HE Shuimiao, BAN Zhijie   

  1. Key Laboratory of Social Computing and Data Processing in Inner Mongolia Autonomous Region, School of Computer, Inner Mongolia University, Hohhot 010020, China
  • Online:2022-06-15 Published:2022-06-15

基于权邻域的代表性用户抽样算法

何水苗,班志杰   

  1. 内蒙古大学 计算机学院 内蒙古自治区社会计算与数据处理重点实验室,呼和浩特 010020

Abstract: The representative user sampling method is widely used in the field of social network analysis, how to make its subset represent all users in the network has great significance. The existing methods pay less attention to the large amount of useful information of potential users in the network topology, by optimizing the statistical stratified sampling model, the representative user sampling algorithm based on weighted neighborhood is proposed. In order to get more valuable content from the network topology, the algorithm uses the weighted neighborhood to improve the calculation method of user representation, and combines with user attributes. Then users are divided into different attribute groups according to their attribute values, and the representation of users in each attribute group is calculated. After that, the representation of representative users is measured by quality function. The heuristic greedy algorithm is used to extract representative users. By comparing with six traditional sampling algorithms on four data sets, the results show that the representative user sampling algorithm based on weighted neighborhood improves the accuracy rate, recall rate and F1-Measure evaluation index.

Key words: social network, representative user sampling, weighted neighborhood, topological structure, user representation

摘要: 代表性用户抽样方法在社会网络分析领域中得到广泛的应用,如何使其抽取的子集代表网络中所有用户具有重要的研究意义。现有方法较少关注网络拓扑结构中用户潜在的大量有用信息,通过对统计分层抽样模型进行优化,提出了一种基于权邻域的代表性用户抽样算法。为了从网络拓扑结构中获得用户更多有价值的内容,该算法使用权邻域对用户代表度计算方法进行改进,同时与用户属性相结合。之后根据用户属性值将用户分成不同属性组,计算用户在每个属性组的代表度。接着通过质量函数来衡量代表性用户的代表程度。采用启发式贪心算法抽取代表性用户。在4个数据集上与6种传统抽样算法进行实验比较,结果表明基于权邻域的代表性用户抽样算法在精确率、召回率和F1-Measure评价指标上均有提升。

关键词: 社交网络, 代表性用户抽样, 权邻域, 拓扑结构, 用户代表度