计算机工程与应用 ›› 2012, Vol. 48 ›› Issue (33): 136-141.

• 数据库、信号与信息处理 • 上一篇    下一篇

半监督学习在链接预测问题中的应用

陈可佳1,2,韩京宇1,2,郑正中1   

  1. 1.南京邮电大学 计算机学院,南京 210046
    2.南京邮电大学 计算机技术研究所,南京 210003
  • 出版日期:2012-11-21 发布日期:2012-11-20

Link prediction using semi-supervised learning

CHEN Kejia1,2, HAN Jingyu1,2, ZHENG Zhengzhong1   

  1. 1.College of Computer, Nanjing University of Posts and Communications, Nanjing 210046, China
    2.Institute of Computer Technology, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
  • Online:2012-11-21 Published:2012-11-20

摘要: 链接预测是社会网络分析领域的关键问题,研究如何从已知网络中预测可能存在的新链接。现实网络中存在了大量未连接的节点对,从中挖掘潜在信息可以帮助实现链接预测任务。将链接预测视为二类分类问题,使用半监督学习技术,利用网络中的未标记数据帮助学习。使用了两种半监督范式:自我训练和协同训练。在现实数据集Enron和DBLP中的实验结果表明,链接预测任务中采用未标记数据能够有效提高预测的准确率。

关键词: 链接预测, 半监督学习, 自我训练, 协同训练, 社会网络分析

Abstract: Link prediction is the key issue in social network analysis, which aims to predict new links from the known networks. In real networks, there exist a large number of unlinked node pairs, whose intrinsic information can be helpful to the link prediction task. In this paper, link prediction is looked as a binary classification problem, where the semi-supervised learning techniques are used to take advantage of unlabeled data in learning process. Two semi-supervised paradigms are used:self-training and co-training. The experimental results in real datasets Enron and DBLP show that, exploiting unlabeled data can effectively improve the accuracy of the link prediction.

Key words: link prediction, semi-supervised learning, self-training, co-training, social network analysis