计算机工程与应用 ›› 2021, Vol. 57 ›› Issue (12): 126-131.DOI: 10.3778/j.issn.1002-8331.2101-0225

• 模式识别与人工智能 • 上一篇    下一篇

抗频率变换的采样计数音频检索方法

姚姗姗,牛保宁   

  1. 1.山西大学 大数据科学与产业研究院,太原 030006
    2.太原理工大学 信息与计算机学院,山西 晋中 030600
  • 出版日期:2021-06-15 发布日期:2021-06-10

Sampling and Counting Audio Retrieval Method Resistant to Pitch-Shift

YAO Shanshan, NIU Baoning   

  1. 1.Institute of Big Data Science and Industry, Shanxi University, Taiyuan 030006, China
    2.School of Information and Computer, Taiyuan University of Technology, Jinzhong, Shanxi 030600, China
  • Online:2021-06-15 Published:2021-06-10

摘要:

理想的音频检索方法可以准确、高效地从大规模音频数据库中识别所有音频。但是,目前还没有一种方法可以对所有噪声干扰鲁棒。基于Philips指纹的采样计数音频检索方法是目前最高效的方法之一,如果能解决其无法抵抗线性变换(时间缩放、频率变换)的缺点,则整个采样计数音频检索方法将进一步趋于理想。针对其中的频率变换问题,提出抗频率变换的采样计数音频检索方法,包括变频带间隔的查询指纹生成方法、多频率尺度的查询匹配方法,以及分步骤指纹提取和变过滤阈值两种加速策略。该方法可以抵抗70%到130%的频率变换,效果与目前最好的QUAD方法相当,并且可以扩展到任意使用Philips类的指纹的检索方法以增强其抵抗频率变换干扰的能力。

关键词: 音频检索, 音频指纹, 频率变换

Abstract:

An ideal audio retrieval method can identify all audio clips from large-scale audio database accurately and efficiently. However, none of the methods is robust to all types of distortions. The sampling and counting method based on Philips fingerprint is one of the most efficient methods at present. If it could resist speed changes, including time-stretch and pitch-shift, it would be promising towards an ideal retrieval method. To solve the pitch-shift problem, a sampling and counting method resistant to pitch-shift is proposed, including a method for generating the query fingerprint with changed frequency band intervals, a method for matching the query with multi-scale of pitch and two acceleration strategies, namely, fingerprint extraction in steps and adjustable filtering thresholds. The proposed method can resist pitch-shift from 70% to 130%, and the performance is equivalent to the state-of-the-art QUAD method. Moreover, it can be extended to any retrieval method using Philips-like fingerprint to enhance its ability to resist pitch-shift.

Key words: audio retrieval, audio fingerprint, pitch-shift