Computer Engineering and Applications ›› 2020, Vol. 56 ›› Issue (7): 162-169.DOI: 10.3778/j.issn.1002-8331.1812-0126

Previous Articles     Next Articles

Optimized Orthogonal Matching Pursuit and Short-Time Spectrum Estimation for Sound Recognition

CHEN Qiuju, XU Jianguo   

  1. 1.Department of Wine Engineering Automation, Moutai Institute, Renhuai, Guizhou 564500, China
    2.College of Systems Engineering, National University of Defense Technology, Changsha 410073, China
  • Online:2020-04-01 Published:2020-03-28



  1. 1.茅台学院 酿酒工程自动化系,贵州 仁怀 564500
    2.国防科学技术大学 系统工程学院,长沙 410073


A sound event recognition method based on optimized Orthogonal Matching Pursuit(OMP) and short-time spectrum estimation is proposed for decreasing the influence of sound event recognition on various environments. Firstly, Particle Swarm Optimization(PSO) is adopted to optimize OMP for sparse decomposition and reconstruction of sound signal to reserve the main body of sound signal. Secondly, the short-time spectrum estimation algorithm is employed to strengthen the residue signal after the first reconstruction and compensate the first reconstructed sound signal to reduce the influence of non-stationary noise and improve the precision of reconstructed sound signal. Then, an anti-noise composited feature of Mel Frequency Cepstrum Coefficient(MFCC), time-frequency OMP feature, and Pitch feature is extracted from reconstructed signal, called OOMP feature. Finally, Deep Belief Networks(DBN) is employed to learn the OOMP feature and recognize 40 classes of sound events in different environment and SNR. The mean recognition rate can reach at 70.44% in different environment and SNR, and 49.9% even at ?5?dB, the experimental results show that the proposed method can effectively recognize sound events in various environments.

Key words: sound event recognition, Orthogonal Matching Pursuit(OMP), Particle Swarm Optimization(PSO), short-time spectrum estimation, Deep Belief Networks(DBN)


声音事件识别时受到各种环境声的影响,采用优化正交匹配跟踪(Orthogonal Matching Pursuit,OMP)和短时谱估计对声音信号进行二次重构,能有效提高识别性能。采用粒子群算法(Particle Swarm Optimization,PSO)优化OMP稀疏分解作首次重构,保留声音信号的主体;采用短时谱估计对首次重构后的残余信号作声音增强处理,完成二次重构,去除非平稳噪声和提高重构声音信号的精度;对重构信号提取梅尔频率倒谱系数(Mel Frequency Cepstrum Coefficient,MFCC)特征、优化OMP时-频特征和基频(Pitch)特征,组成复合抗噪特征集OOMP;使用深度置信网络(Deep Belief Network,DBN)对OOMP特征进行学习,并对40种声音事件在不同环境不同信噪比下进行识别。实验结果表明,该方法在不同信噪比的各种环境声中平均识别率为70.44%,且在?5?dB的情况下仍然可以达到49.90%的识别率,从而说明所提方法能有效地识别各种环境下的声音事件。

关键词: 声音事件识别, 正交匹配追踪, 粒子群优化, 短时谱估计, 深度置信网