Clustering method based on concept and semantic similarity

Abstract

Abstract: This paper introduces a new document clustering method using concept and semantic similarity—Text Clustering Based on Concept and Semantic Similarity（TCBCSS）. Key concept is extracted, instead of the keyword, to form semantic network. The semantic network is analyzed using Six Degrees of Separation and geometric characteristics, to build concept lists, which represent the document. This not only resolves the problem of differentially expressed, but also is more convenient for similarity computation. TCBCSS algorithm uses semantic similarity of concept lists as a measure of similarity between the two documents, and clusters the document based on graph, to avoid some?limitations?of?the?clustering algorithm?on?the?clustered shape. Experimental results prove that TCBCSS algorithm improves the quality of the clustering.

Key words: text clustering, concept, text representation, Six Degrees of Separation, semantic similarity

摘要： 提出一种基于概念和语义相似度的聚类算法TCBCSS（Text Clustering Based on Concept and Semantic Similarity），TCBCSS算法基于WordNet对文档概念进行抽取和归并，形成语义网络，利用小世界理论和网络的几何特性对其进行分析并构建概念列表来表示文档，不仅有效解决了“表达差异”问题也有利于文档相似度的计算。TCBCSS算法利用两个概念列表的语义相似度作为文档间相近程度的度量，以图为基础进行聚类分析，避免了有些聚类算法对聚簇形状的限制，试验证明TCBCSS算法提高了聚类质量。

关键词: 文本聚类, 概念, 文本表示, 小世界理论, 语义相似度

JIAO Fenfen. Clustering method based on concept and semantic similarity[J]. Computer Engineering and Applications, 2012, 48(18): 136-141.

焦芬芬. 基于概念和语义相似度的文本聚类算法[J]. 计算机工程与应用, 2012, 48(18): 136-141.

[1]	HUO Guangyu, ZHANG Yong, SUN Yanfeng, YIN Baocai. Research on Archive Data Intelligent Classification Based on Semantic [J]. Computer Engineering and Applications, 2021, 57(6): 247-253.
[2]	HU Xiaomin, WANG Mingfeng, ZHANG Shourong, LI Min. New Differential Evolution with Particle Swarm Optimization Algorithm for Text Clustering [J]. Computer Engineering and Applications, 2021, 57(4): 61-67.
[3]	ZHANG Chengling, LI Jinjin, LIN Yidong. Attribute Reduction in Formal Contexts Based on OE-Concept Lattices [J]. Computer Engineering and Applications, 2021, 57(15): 82-89.
[4]	SHI Chen, ZHANG Yu, HU Bo. Model for Near-Synonym/Synonym Phrase Finding Based on Common Surrounding Context [J]. Computer Engineering and Applications, 2021, 57(14): 142-147.
[5]	QIAO Weitao, HUANG Haiyan, WANG Shan. Semantic Similarity Calculation Based on Transformer Encoder [J]. Computer Engineering and Applications, 2021, 57(14): 158-163.
[6]	WANG Junhong, GUO Yahui. Imbalanced Data Stream Classification Algorithm for Dynamic Data Chunk [J]. Computer Engineering and Applications, 2021, 57(13): 124-129.
[7]	YUAN Zhongchen, MA Zongmin. Ensemble Classification for UML Class Diagram Based on Semantics [J]. Computer Engineering and Applications, 2021, 57(12): 257-262.
[8]	XIE Xiang, ZHANG Qianru, ZHANG Jing, GAO Xinyu. Research and Application on Information System Component Identification Method Based on Domain Modeling-Oriented [J]. Computer Engineering and Applications, 2021, 57(12): 105-114.
[9]	YANG Geying, SHEN Xiajiong, SHI Xianjin, ZHANG Lei. Visualization of Association Rules in Context of Concept Lattices [J]. Computer Engineering and Applications, 2021, 57(1): 84-91.
[10]	PAN Chengsheng, ZHANG Bin, LYU Yana, DU Xiuli, QIU Shaoming. K-Means Text Clustering Based on Improved Gray Wolf Optimization Algorithm [J]. Computer Engineering and Applications, 2021, 57(1): 188-193.
[11]	ZHANG Weiwei, HU Yaqi, ZHAI Guangyu, LIU Zhipeng. Academic Abstract Clustering Method Based on LDA Model and Doc2vec [J]. Computer Engineering and Applications, 2020, 56(6): 180-185.
[12]	XU Ge, YANG Xiaoyan, WANG Tao. Survey on Semantic Similarity Calculation of Words [J]. Computer Engineering and Applications, 2020, 56(4): 9-15.
[13]	XU Qingyan, HE Li, ZHU Hongxi. Improved Detection Method of Concept Drift Based on the Hoeffding Inequality [J]. Computer Engineering and Applications, 2020, 56(19): 55-61.
[14]	YAO Jiaqi, XU Zhengguo, YAN Jikun, XIONG Gang, LI Zhixiang. Dynamic Multi-label Text Classification Algorithm Based on Label Semantic Similarity [J]. Computer Engineering and Applications, 2020, 56(19): 94-98.
[15]	YANG Quan, SUN Yuquan. Research on Semantic Similarity Calculation Based on Depth of CiLin [J]. Computer Engineering and Applications, 2020, 56(17): 48-54.

Clustering method based on concept and semantic similarity

基于概念和语义相似度的文本聚类算法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics