计算机工程与应用 ›› 2013, Vol. 49 ›› Issue (18): 99-104.

• 数据库、数据挖掘、机器学习 • 上一篇    下一篇

基于分片的高维稀疏数据存储模式优化研究

邵慧萌,舒红平,郑皎凌,许源平,文立玉   

  1. 成都信息工程学院 软件工程系,成都 610225
  • 出版日期:2013-09-15 发布日期:2013-09-13

Storage model optimization toward high dimensional sparse data based on slicing

SHAO Huimeng, SHU Hongping, ZHENG Jiaoling, XU Yuanping, WEN Liyu   

  1. Department of Software Engineering, Chengdu University of Information Technology, Chengdu 610225, China
  • Online:2013-09-15 Published:2013-09-13

摘要: 针对大型数据库中高维稀疏关系表空字段对存储空间的占用问题,通过利用传统行存储数据库模拟列式存储数据库的工作原理,设计了一种基于分片的数据库结构。通过实验分析,数据库存储空间比原始模式降低了27.42%左右。在对高维稀疏数据中五个字段进行查询时,I/O数据块个数降低至原始模式的35.27%,对高维稀疏数据中四个字段进行查询时,I/O数据块个数降低至原始模式的28.22%,而随着字段的减少I/O数据块仍会进一步减少,从而提高了数据库的访问效率。

关键词: 高维, 稀疏数据, 列式存储数据库

Abstract: In large databases, empty fields in high dimensional sparse table may occupy a large amount of storage space. To deal with this problem, a slicing based database structure, which simulates the principle of column-store database in traditional row-store database, is designed. By analyzing on testing results, the storage space is decreased about 27.42% lower than the original model. The number of I/O data block is reduced to 35.27% when five fields of high-dimensional sparse data are queried, and 28.22% when four fields are selected. There will be a further reduction when the number of fields reduces. Therefore the accessing efficiency for database is improved.

Key words: high-dimensional, sparse data, column-store database