计算机工程与应用 ›› 2016, Vol. 52 ›› Issue (7): 86-90.

• 大数据与云计算 • 上一篇    下一篇

基于词条之间关联关系的文档聚类

任建华,沈炎彬,孟祥福,王  伟   

  1. 辽宁工程技术大学 电子与信息工程学院,辽宁 葫芦岛 125105
  • 出版日期:2016-04-01 发布日期:2016-04-19

Document clustering based on association relations between terms

REN Jianhua, SHEN Yanbin, MENG Xiangfu, WANG Wei   

  1. School of Electronic and Information Engineering, Liaoning Technical University, Huludao, Liaoning 125105, China
  • Online:2016-04-01 Published:2016-04-19

摘要: 针对现有的空间向量模型在进行文档表示时忽略词条之间的语义关系的不足,提出了一种新的基于关联规则的文档向量表示方法。在广义空间向量模型中分析词条的频繁同现关系得到词条同现语义,根据关联规则分析词条之间的关联相关性,挖掘出文档中词条之间的潜在关联语义关系,将词条同现语义和关联语义线性加权对文档进行表示。实验结果表明,与BOW模型和GVSM模型相比,采用关联规则文档向量表示的文档聚类结果更准确。

关键词: 文档聚类, 关联关系, 词条同现, 文档相似度, 潜在语义

Abstract: For the existing vector space model to omit making insufficient semantic relationships between terms in documents representation, this paper proposes a novel document vector representation approach based association relationship. In terms of generalized vector space model, it captures the frequent co-occurrence semantic relations between terms, and then analyzes the correlation between related terms based on association rules, digging out the potential relevance of semantic relationships between terms in the document. It represents documents with linear weighting co-occurrence semantic relations with association semantic. Experimental results show that, compared with the BOW model and GVSM model, the clustering results using association rules document vector represented are more accurate.

Key words: document clustering, association, terms co-occurrence, document similarity, latent semantic