Computer Engineering and Applications ›› 2013, Vol. 49 ›› Issue (9): 197-199.

Previous Articles     Next Articles

Robust audio fingerprint using lifting-based wavelet and non-negative matrix factorization

LONG Xiaobao   

  1. School of Computer, Chongqing University, Chongqing 400030, China
  • Online:2013-05-01 Published:2016-03-28

使用提升小波和非负矩阵分解的稳健音频指纹

龙小保   

  1. 重庆大学 计算机学院,重庆 400030

Abstract: Aiming at Content-Based Audio Retrieval(CBAR) applications, a robust audio fingerprint method using lifting-based wavelet transform and non-negative matrix factorization is proposed. The original audio is divided to small frames by fixed length and then low-frequent and high-frequent components coefficients are obtained by lifting-based wavelet transformation in every frame. The every audio frame is approximately represented as a product of a base matrix and a coefficient matrix using Non-negative Matrix Factorization(NMF). The sum of each column in the coefficient matrix is calculated, which is then quantized to produce 1 bit of the fingerprint sequence. Experiment results show that the proposed scheme is robust against common audio processing. It is insensitive to small local change, and therefore is suitable for distinguishing different audios.

Key words: audio fingerprint, audio digest, non-negative matrix factorization, lifting-based wavelet, audio retrieval

摘要: 针对音频检索应用,提出一种使用提升小波变换和非负矩阵分解的稳健音频指纹方案。原始音频按固定长度分帧,对每帧进行小波提升变换得到低频近似分量和高频细节分量;对细节分量作非负矩阵分解得到可近似表示音频子帧的基矩阵和系数矩阵;将系数矩阵各列元素累加,对各列累加和进行量化得到表示分帧音频指纹序列的1 bit信息。实验结果表明该方案对常见音频处理操作具有良好的稳健性,对音频的局部变化不敏感,能较好地区分不同音频,可用于面向对象的音频检索。

关键词: 音频指纹, 音频摘要, 非负矩阵分解, 提升小波, 音频检索