计算机工程与应用 ›› 2013, Vol. 49 ›› Issue (14): 126-129.

• 数据库、数据挖掘、机器学习 • 上一篇    下一篇

词共现网络的遗传聚类在话题发现中的应用

杨  菲1,黄柏雄2   

  1. 1.珠海城市职业技术学院 工程与信息学院,广东 珠海 519090
    2.广西大学 计算机与电子信息学院,南宁 530004
  • 出版日期:2013-07-15 发布日期:2013-07-31

Application of GCA of word co-occurrence network in topic detection

YANG Fei1, HUANG Boxiong2   

  1. 1.School of Engineering and Information, Zhuhai City Polytechnic, Zhuhai, Guangdong 519090, China
    2.College of Computer and Electronic Information, Guangxi University, Nanning 530004, China
  • Online:2013-07-15 Published:2013-07-31

摘要: 基于词聚类的话题发现方法中,普遍存在聚类结果不稳定(聚类结果较大程度依赖于聚类对象的初始化操作)的问题,为此通过将文档集建模为词共现网络,设计词共现网络的过滤方法,然后提出基于词共现网络的遗传聚类算法,实现从网络文档中提取热点话题。与已有方法相比,该方法所发现的话题相对稳定,这在实验中亦得到了验证,因而该方法在实际应用中具有更好的现实意义。

关键词: 话题发现, 词共现网络, 遗传聚类算法, 词聚类算法

Abstract: In the topic detection methods, there usually exists the problem of unstable clustering results. In this paper, a network document set is modeled as word co-occurrence network, and a filtering method is designed so as to simplify the network, and then a GCA (Genetic Clustering Algorithm) is proposed for clustering the simplified network, such extracting topics from a network document set. Compared with other existing methods, the proposed method seems more stable for the obtained clustering results, which also has been confirmed in the experiment. This means the proposed method has better practical significance in actual applications.

Key words: topic detection, word co-occurrence network, Genetic Clustering Algorithm(GCA), word clustering algorithm