基于概念的文本表示模型

doi:10.3778/j.issn.1002-8331.2008.20.049

计算机工程与应用 ›› 2008, Vol. 44 ›› Issue (20): 162-164.DOI: 10.3778/j.issn.1002-8331.2008.20.049

• 数据库、信号与信息处理 • 上一篇下一篇

基于概念的文本表示模型

陈龙,范瑞霞,高琪

北京理工大学模式识别与智能系统研究所，北京 100081

收稿日期:2007-09-27 修回日期:2008-01-23 出版日期:2008-07-11 发布日期:2008-07-11
通讯作者: 陈龙

Model of text representation based on concept

CHEN Long,FAN Rui-xia,GAO Qi

Beijing Instituts of Technology，Beijing 100081，China

Received:2007-09-27 Revised:2008-01-23 Online:2008-07-11 Published:2008-07-11
Contact: CHEN Long

摘要/Abstract

摘要： 文本信息处理正朝着语义的方向发展，而当今主流的文本表示模型——向量空间模型（VSM）以单个词语作为特征项，这忽略了自然语言中词语之间的语义联系、导致文本中大量存在同义词与多义词现象，从而严重地降低了文本信息处理的精度。应用自然语言处理相关技术和成果，把概念和概念距离引入向量空间模型，从语义、概念的角度出发，以概念作为文本的特征项，建立基于概念的文本表示模型。实验证明：这种方法能较好地解决同义词和多义词问题、提高了文本分类的查全率和查准率。

关键词: 文本表示模型, 概念, 概念距离

Abstract: The information processing of text is advancing towards semantic direction，but nowadays the dominating model of text representation，which is called the Vector Space Model uses a single word to be the characteristic item.It neglects the lexical relation between words，thereby leading to a low precision of text information processing due to the fact that synonymy and polysemy exist in large numbers in natural languages.This paper uses the techniques and results of natural language processing，and introduces concept and distance of concept into the Vector Space Model.An improved model of text representation is then built based on concept as a characteristic item of the text from the perspective of semantics and concept.Proved by experiments，this method can resolve the synonymous and polysemantic problems commendably，improve the precision and recall to a great extent.

Key words: text representation model, concept, distance of concept

陈龙,范瑞霞,高琪. 基于概念的文本表示模型[J]. 计算机工程与应用, 2008, 44(20): 162-164.

CHEN Long,FAN Rui-xia,GAO Qi. Model of text representation based on concept[J]. Computer Engineering and Applications, 2008, 44(20): 162-164.

[1]	张呈玲，李进金，林艺东. 基于OE-概念格的形式背景属性约简[J]. 计算机工程与应用, 2021, 57(15): 82-89.
[2]	王俊红，郭亚慧. 面向动态数据块的非平衡数据流分类算法[J]. 计算机工程与应用, 2021, 57(13): 124-129.
[3]	谢祥，张茜茹，张婧，高新宇. 面向领域建模的信息系统构件识别方法研究[J]. 计算机工程与应用, 2021, 57(12): 105-114.
[4]	杨葛英，沈夏炯，史先进，张磊. 以概念格为背景的关联规则可视化[J]. 计算机工程与应用, 2021, 57(1): 84-91.
[5]	徐清妍，何丽，朱泓西. 改进Hoeffding不等式的概念漂移检测方法[J]. 计算机工程与应用, 2020, 56(19): 55-61.
[6]	胡阳，胡学钢，李培培. 基于Spark的快速短文本数据流分类方法[J]. 计算机工程与应用, 2020, 56(14): 138-147.
[7]	折延宏，胡梦婷，贺晓丽，曾望林. 两种多粒度形式概念分析模型的比较研究[J]. 计算机工程与应用, 2020, 56(10): 51-55.
[8]	王诗宇，刘洪星，范家佳. 移动用户界面概念模型到代码的转换方法研究[J]. 计算机工程与应用, 2020, 56(10): 240-245.
[9]	姜振东1，王建明1，潘吴斌2. 基于概念漂移检测的自适应流量分类方法[J]. 计算机工程与应用, 2019, 55(3): 68-75.
[10]	贺晓丽，刘华丽，刘瑶瑶. 多粒度数据的区间形式概念分析方法[J]. 计算机工程与应用, 2019, 55(19): 52-57.
[11]	渠寒花1，惠建忠1，何险峰2，王慕华1，何晓凤1，丰德恩1. 气象服务形式概念分析模型研究[J]. 计算机工程与应用, 2018, 54(9): 257-264.
[12]	林园园，战洪飞，余军合，张桂海. 数据驱动的产品概念设计知识服务模型构建[J]. 计算机工程与应用, 2018, 54(16): 211-219.
[13]	王红，张昊，史金钏. 基于LDA的领域本体概念获取方法研究[J]. 计算机工程与应用, 2018, 54(13): 252-257.
[14]	陈恒1，2，李冠宇2，陈鑫影2，3. 模块化思想在大规模本体匹配中的应用[J]. 计算机工程与应用, 2017, 53(8): 149-153.
[15]	白冬辉，张涛，魏昕宇. 基于属性度的属性排序算法[J]. 计算机工程与应用, 2017, 53(5): 64-68.

基于概念的文本表示模型

Model of text representation based on concept

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics