基于概念格的Web文本聚类

doi:10.3778/j.issn.1002-8331.2008.23.052

计算机工程与应用 ›› 2008, Vol. 44 ›› Issue (23): 169-171.DOI: 10.3778/j.issn.1002-8331.2008.23.052

• 数据库、信号与信息处理 • 上一篇下一篇

基于概念格的Web文本聚类

李云,田素方,李拓,徐涛

扬州大学信息工程学院，江苏扬州 225009

收稿日期:2007-10-09 修回日期:2007-12-17 出版日期:2008-08-11 发布日期:2008-08-11
通讯作者: 李云

Web text clustering based on concept lattice

LI Yun,TIAN Su-fang,LI Tuo,XU Tao

Institute of Information Engineering，Yangzhou University，Yangzhou，Jiangsu 225009，China

Received:2007-10-09 Revised:2007-12-17 Online:2008-08-11 Published:2008-08-11
Contact: LI Yun

摘要/Abstract

摘要： Web文本聚类大多是基于空间向量文本表示模型的，它没有考虑特征词之间的语义关系，并且特征词的维数非常高，造成文本语义信息的损失和时间复杂度的增加。把文本作为对象，文本中的特征词作为对应的属性，形成了基于文本的形式背景，从中提取概念来表示文本并度量文本之间的相似度，从而降低了特征词的维数，减少了计算的复杂度，取得了良好的聚类结果。

关键词: Web文档, 聚类, 概念格, 约简

Abstract: Web text clustering are mostly based on space vector text express model，the semantics relation of the terms in the text is not considered in this method and the dimension of the terms is very high，which results in the losing of text semantics and the increase of time complexity.The text is considered as object in this paper，and the term of text is abstract as the corresponding attribute.Therefore，a formal context is formed based on text.To express text and measure the similarity the authors extract the concept from formal context.Thus，the dimension of term is reduced，and the complexity of computation is decreased too.Theoretical analysis shows that the method of clustering is effective.

Key words: Web document, clustering, concept lattice, reduce

李云,田素方,李拓,徐涛. 基于概念格的Web文本聚类[J]. 计算机工程与应用, 2008, 44(23): 169-171.

LI Yun,TIAN Su-fang,LI Tuo,XU Tao. Web text clustering based on concept lattice[J]. Computer Engineering and Applications, 2008, 44(23): 169-171.

[1]	兰红，黄敏. 融合KNN优化的密度峰值和FCM聚类算法[J]. 计算机工程与应用, 2021, 57(9): 81-88.
[2]	郭晓静，隋昊达. 改进YOLOv3在机场跑道异物目标检测中的应用[J]. 计算机工程与应用, 2021, 57(8): 249-255.
[3]	李莉，纪欣沅，宋嵩. 回环软件缺陷数量预测模型[J]. 计算机工程与应用, 2021, 57(7): 158-163.
[4]	杨芳，尹曦，司建辉，刘宏媛，汪雪. 基于侧重点聚类的数学表达式相似度计算方法[J]. 计算机工程与应用, 2021, 57(6): 88-93.
[5]	赵凡，张琳，闻治泉，杨林林，蔺广逢. 一种直接高效的自然场景汉字逼近定位方法[J]. 计算机工程与应用, 2021, 57(6): 159-167.
[6]	霍光煜，张勇，孙艳丰，尹宝才. 基于语义的档案数据智能分类方法研究[J]. 计算机工程与应用, 2021, 57(6): 247-253.
[7]	彭启慧，宣士斌，高卿. 分布的自动阈值密度峰值聚类算法[J]. 计算机工程与应用, 2021, 57(5): 71-78.
[8]	李勇振，廖湖声. 基于图卷积神经网络的多视角聚类[J]. 计算机工程与应用, 2021, 57(5): 115-122.
[9]	王昌龙，张远东，缪宏，杨煜恒. 双通道卷积神经网络在南瓜病害识别上的应用[J]. 计算机工程与应用, 2021, 57(5): 183-189.
[10]	代琪，李敏，刘洋，李丽红. 模糊层次商空间的快速属性约简算法[J]. 计算机工程与应用, 2021, 57(4): 55-60.
[11]	胡晓敏，王明丰，张首荣，李敏. 用于文本聚类的新型差分进化粒子群算法[J]. 计算机工程与应用, 2021, 57(4): 61-67.
[12]	王俊玲，卢新明. 基于语义相关的视频关键帧提取算法[J]. 计算机工程与应用, 2021, 57(4): 192-198.
[13]	高天宇，王庆荣，杨磊. 粗糙集属性依赖度强化的应急数据挖掘模型[J]. 计算机工程与应用, 2021, 57(3): 87-93.
[14]	王芙银，张德生，张晓. 结合鲸鱼优化算法的自适应密度峰值聚类算法[J]. 计算机工程与应用, 2021, 57(3): 94-102.
[15]	陈俊丰，郑中团. WKMeans与SMOTE结合的不平衡数据过采样方法[J]. 计算机工程与应用, 2021, 57(23): 106-112.

基于概念格的Web文本聚类

Web text clustering based on concept lattice

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics