计算机工程与应用 ›› 2011, Vol. 47 ›› Issue (34): 137-140.

• 数据库、信号与信息处理 • 上一篇    下一篇

微博自动标引关键技术的研究

程传鹏,夏敏捷   

  1. 中原工学院 计算机学院,郑州 450007
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2011-12-01 发布日期:2011-12-01

Study on key technology of automatic indexing of MicroBlog

CHENG Chuanpeng,XIA Minjie   

  1. School of Computer Science,Zhongyuan Institute of Technology,Zhengzhou 450007,China
  • Received:1900-01-01 Revised:1900-01-01 Online:2011-12-01 Published:2011-12-01

摘要: 针对微博文本的特点,提出了一种自动识别微博标引词的方法。根据微博文本中的名词或动词之间语义相似度构造图的邻接矩阵,在图的邻接矩阵基础上利用Pagerank算法思想来计算词语的重要度,选择重要度较大的一些词作为标引词。实验结果表明,较传统的自动标引方法,提出的自动标引方法简单实用、准确率较高。

关键词: 微博, 自动标引, 邻接矩阵, Pagerank算法, 重要度

Abstract: In view of the feature of MicroBlog text,this paper improves a method to automatic distinguish indexing words in MicroBlog.The adjacency matrix is constructed according to semantic similarity of nouns or verbs,the importance of words is computed according to theory of Pagerank based on adjacency matrix.Some more important word is considered to be indexing words.The experiments show that the method is easy and practical compared with traditional method.

Key words: MicroBlog, automatic indexing, adjacency matrix, Pagerank, importance