计算机工程与应用 ›› 2014, Vol. 50 ›› Issue (3): 112-116.

• 数据库、数据挖掘、机器学习 • 上一篇    下一篇

中文文本压缩的LZW算法

陈庆辉1,2,陈小松1,韩德良1   

  1. 1.中南大学 数学与统计学院,长沙 410083
    2.中南大学 商学院,长沙 410083
  • 出版日期:2014-02-01 发布日期:2014-01-26

Compression algorithm LZW on Chinese text

CHEN Qinghui1,2,CHEN Xiaosong1,HAN Deliang1   

  1. 1.School of Mathematics and Statistics, Central South University, Changsha 410083, China
    2.School of Business, Central South University, Changsha 410083, China
  • Online:2014-02-01 Published:2014-01-26

摘要: 结合中文文本中的汉字编码方式、大字符集以及重复字串不长三个不同于英文文本的结构特点对LZW算法从读取数据方式、基本码集和字典码值输出方式三方面进行了修改。改进后的算法对中文文本的压缩比平均比LZW19提高了19%且压缩和解压速度与后者相当,其对较长的中文文本的平均压缩比已接近或者超过了压缩软件WinRAR。

关键词: 中文文本, 数据压缩, 压缩算法, 编码, LZW

Abstract: This paper presents a compression algorithm for Chinese text which is improved from LZW algorithm. By modifying LZW algorithm’s dictionary size, basic set and the output way of dictionary code, the improved algorithm LZW_CH demonstrates about 19% higher compression ratio than LZW19’s with almost the same execution speed. LZW_CH doesn’t need any pre-processing work for the compressing data. As a single compression algorithm, LZW_CH’s compression with long Chinese text has closed or exceeded the professional compression utility WinRAR.

Key words: Chinese text, data compression, compression algorithm, encoding, LZW