计算机工程与应用 ›› 2010, Vol. 46 ›› Issue (5): 21-23.DOI: 10.3778/j.issn.1002-8331.2010.05.007

• 博士论坛 • 上一篇    下一篇

一种新的英文文本检索算法

高仕龙   

  1. 乐山师范学院 数学系,四川 乐山 614000
  • 收稿日期:2009-10-23 修回日期:2009-12-08 出版日期:2010-02-11 发布日期:2010-02-11
  • 通讯作者: 高仕龙

New retrieval algorithm for English texts

GAO Shi-long   

  1. Department of Mathematics,Leshan Normal University,Leshan,Sichuan 614000,China
  • Received:2009-10-23 Revised:2009-12-08 Online:2010-02-11 Published:2010-02-11
  • Contact: GAO Shi-long

摘要: 提出一种新的英文文本检索算法,该算法将英文文本映射为26阶频率矩阵,然后通过奇异值分解,对文本表示空间进行降维处理,并融合第一奇异值分量和第二奇异值分量的特征,得到既反映字母统计频率,又反映文本字符间顺序结构的复特征向量,最后利用向量间余弦相似度作为文本检索的相似度度量。数据对比表明,算法取得了较好的实验效果,且在检索准确率和运算效率上优于经典的LSA算法。

关键词: 文本检索, 特征融合, 频率矩阵, 奇异值分解

Abstract: In this paper,a new retrieval algorithm for English texts is proposed.First of all,the English texts are mapped into frequency matrixes of order 26 and the dimensions of texts representation space are reduced through singular value decomposition.Second,it fuses the features of the first singular value component and the second one,and then gets the complex feature vectors which reflect not only the statistic frequency but also the sequential structure of letters.In the end,the cosine similarity of texts is used to measure the similarity between the query and documents.The data comparison indicates that this algorithm has well expe-
rimental results.Moreover,it gets the advantage over the classic LSA retrieval algorithm in precision and operational efficiency.

Key words: texts retrieval, feature fusion, frequency matrix, Singular Value Decomposition(SVD)

中图分类号: