计算机工程与应用 ›› 2009, Vol. 45 ›› Issue (31): 118-121.DOI: 10.3778/j.issn.1002-8331.2009.31.035

• 数据库、信号与信息处理 • 上一篇    下一篇

基于关键名词短语聚类的中文搜索结果聚类

麻雪云1,肖诗斌1,2,王弘蔚1,2,施水才1,2   

  1. 1.北京信息科技大学 中文信息处理研究中心,北京 100101
    2.北京拓尔思信息技术股份有限公司,北京 100101
  • 收稿日期:2008-06-16 修回日期:2008-10-16 出版日期:2009-11-01 发布日期:2009-11-01
  • 通讯作者: 麻雪云

Chinese search result clustering based on key noun phrase clustering

MA Xue-yun1,XIAO Shi-bin1,2,WANG Hong-wei1,2,SHI Shui-cai1,2   

  1. 1.Chinese Information Processing Research Center,Beijing Information Science and Technology University,Beijing 100101,China
    2.Beijing TRS Information Technology Co.Ltd,Beijing 100101,China
  • Received:2008-06-16 Revised:2008-10-16 Online:2009-11-01 Published:2009-11-01
  • Contact: MA Xue-yun

摘要: 目前,搜索结果聚类方法大多数采用基于文档的方法,不能生成有意义的聚类标签。为了解决这个问题,提出一种基于关键名词短语聚类的中文搜索结果聚类方法,该方法将名词短语、相关搜索词作为候选聚类标签,利用C-Value算法、IDF值筛选标签,然后使用Chameleon算法将标签聚类,最后将搜索结果划分到最相关的聚类簇。实验证明,该方法把关键名词短语和相关搜索词作为聚类标签,有效地提高了标签的描述性,降低了聚类算法的时间复杂度。

关键词: 搜索结果聚类, 关键名词短语抽取, C-Value算法, Chameleon算法

Abstract: Nowadays,the conventional search result clustering methods employ the document-based approach and can not generate clusters with highly readable names.To solve the problem,based on key noun phrase clustering,this paper proposes a method for Chinese search result clustering.First is to extract key phrases from search results,and use the phrases of correlative search as addition.Second is a new label selecting criterion based on C-Value algorithm and the value of IDF.The third is clustering the labels by Chameleon algorithm.Finally,the search result classification has been performed in terms of the results of label clustering.The experiment shows that using key noun phrases and the phrases of correlative search as clustering labels can improve the description of labels and reduce the computation complexity of clustering algorithm.

Key words: search result clustering, key noun phrase extraction, C-Value algorithm, Chameleon algorithm

中图分类号: