计算机工程与应用 ›› 2015, Vol. 51 ›› Issue (19): 119-124.

• 数据库、数据挖掘、机器学习 • 上一篇    下一篇

微博语言的复杂网络特征研究

马宏炜,陆  蓓,谌志群,黄孝喜,王荣波   

  1. 杭州电子科技大学 计算机学院 认知与智能计算研究所,杭州 310018
  • 出版日期:2015-09-30 发布日期:2015-10-13

Research on MicroBlog language characteristics based on complex network

MA Hongwei, LU Bei, CHEN Zhiqun, HUANG Xiaoxi, WANG Rongbo   

  1. Institute of Cognitive and Intelligent Computing, Hangzhou Dianzi University, Hangzhou 310018, China
  • Online:2015-09-30 Published:2015-10-13

摘要: 基于大规模微博语料库,构建了3个词同现语言网络,并采用复杂网络分析工具对这些语言网络进行分析。主要目的是探索复杂网络分析方法应用于微博文本的可行性,进而研究微博语言网络的个性特征。研究结果表明,复杂网络分析方法在微博文本上是可行的,在复杂网络的相关参数,如度分布、聚类系数、平均最短路径等方面反映了微博语言的语体特征。该研究不仅拓展了复杂网络方法在语言学领域的应用,而且为基于复杂网络的微博内容挖掘提供了可行途径。

关键词: 微博, 语言特征, 语言网络, 复杂网络

Abstract: Based on the large-scale MicroBlog text corpus, three different Microblog word co-occurrence language networks are constructed, and their network characteristics are analyzed by using complex network analysis tools. The main purpose of this paper is to explore the feasibility of applying complex network analysis methods to the MicroBlog text for studying MicroBlog language network’s special characteristics. The experimental results show that the complex network methods are feasible for MicroBlog text. MicroBlog text characteristics are described by the complex network’s parameters, such as degree distribution, clustering coefficient, average shortest path, etc. This research extends the applications of complex network methods into linguistics domain, and provides an effective data mining method on MicroBlog text based on complex network.

Key words: MicroBlog, language characteristics, language network, complex network