计算机工程与应用 ›› 2007, Vol. 43 ›› Issue (8): 149-153.

• 数据库与信息处理 • 上一篇    下一篇

一种索引结构的压缩存储及其上的查询处理技术

骆吉洲 李建中   

  1. 哈尔滨工业大学数据库研究中心 黑龙江大学 计算机科学技术学院
  • 收稿日期:2006-04-10 修回日期:1900-01-01 出版日期:2007-03-11 发布日期:2007-03-11
  • 通讯作者: 骆吉洲

The compression and query processing method of a kind of index

Jizhou Luo Jianzhong Li   

  • Received:2006-04-10 Revised:1900-01-01 Online:2007-03-11 Published:2007-03-11
  • Contact: Jizhou Luo

摘要: 在全文信息检索系统中,存储文本及其上关键词的索引结构需要大量的空间。位图索引不能支持基于信息量的查询,倒排文件需要的空间比较大。本文提出了频率向量这种索引结构的压缩存储方法,设计并实现了基于这种压缩存储方法的存储结构,理论分析表明该压缩方法与存储结构可以获得较高的压缩比;此外,本文还讨论了压缩频率向量上的查询处理技术,实验结果表明这种压缩的索引结构能够保证查询结果的完备性并能有效地提高频率向量的存储和查询效率。

关键词: 频率向量, 压缩, 离散化, 查询处理, 倒排索引

Abstract: In full-text retrieval systems, keyword-based indexes is always an important technique for efficient information retrieval. Existing bitmaps can’t support queries based on the quantum of keywords and inverted files need a large amount of storage space. A compression method and a storage structure for a kind of index named frequency vectors is presented in this paper. Theoretical analysis gives a upper bound of compression ratio. Query processing method based on the compressed index is also discussed. Experimental results indicate that our compressed index can guarantee to obtain complete query results and high efficency.

Key words: frequency vector, compression, discretization, query processing, inverted index