计算机工程与应用 ›› 2012, Vol. 48 ›› Issue (10): 217-221.

• 图形、图像、模式识别 • 上一篇    下一篇

H.264编码器的SSE2指令级优化

王  琰,向校萱,祁  燕   

  1. 沈阳理工大学 信息科学与工程学院,沈阳 110159
  • 出版日期:2012-04-01 发布日期:2012-04-11

Instruction-level optimization of H.264 encoder using SSE2 instructions

WANG Yan, XIANG Xiaoxuan, QI Yan   

  1. Institute of Information Science and Engineering, Shenyang Ligong University, Shenyang 110159, China
  • Online:2012-04-01 Published:2012-04-11

摘要: H.264视频编码标准采用了很多新技术,具有更优越的编码效率,同时也增加了计算复杂度,无法满足实时应用。由于单指令多数据扩展指令集2(SSE2)的并行运算能力可以提高计算机对多媒体数据的实时处理。文中主要采用了SSE2对H.264中的一些耗时较多的关键模块,例如整数像素运动估计中计算SAD、整数DCT变换、量化、Hadamard变换以及亚像素运动估计中计算SATD进行了指令级优化。实验结果表明,经过优化后,在保持视频图像质量的前提下,相应模块运行速度得到了提高,使H.264编码器整体的编码速度较好地满足实时要求。

关键词: H.264编码, 绝对误差和, 整数DCT变换, 变换绝对差值总和, 单指令多数据扩展指令集2(SSE2)

Abstract: H.264 video encoding standard adopts lots of new techniques. It has a significant performance benefit comparing with older standards in compression performance. However, it also has a considerable increase in encoder complexity, which limits the application it can be used for. For the Streaming SIMD Extensions 2(SSE2) instruction set which has the parallel computing power to improve the computer’s real-time processing of multimedia data. In this paper, instruction-level optimization of H.264 encoder is proposed by exploiting SSE2 instructions. The key time-consuming modules such as computing the sum of absolute difference(SAD) in integer pixel motion estimation, integer transform, quantization and computing the Hadamard transform of difference matrix and computing the sum of absolute transformed difference(SATD) in sub-pixel motion estimation. The experimental results show that the speed of corresponding module increases after being optimized while the same picture quality is achieved compared with the original encoder in H.264. The post-optimized coding speed of the encoder can satisfy the real-time requirement.

Key words: H.264 encoding, Sum of Absolute Difference(SAD), integer DCT, Sum of Absolute Transformed Difference(SATD), Streaming SIMD Extensions 2(SSE2)