Computer Engineering and Applications ›› 2023, Vol. 59 ›› Issue (19): 99-105.DOI: 10.3778/j.issn.1002-8331.2206-0200

• Pattern Recognition and Artificial Intelligence • Previous Articles     Next Articles

Sound Event Localization and Detection Based on Dual Attention

XU Chundong, LIU Hao, MIN Yuan, ZHEN Yadi   

  1. School of Information Engineering, Jiangxi University of Science and Technology, Ganzhou, Jiangxi 341000, China
  • Online:2023-10-01 Published:2023-10-01

基于双重注意力的声音事件定位与检测

许春冬,刘昊,闵源,甄雅迪   

  1. 江西理工大学 信息工程学院,江西 赣州 341000

Abstract: In recent years, sound event localization and detection have been widely used in various fields. The network model of sound event localization and detection based on deep learning is difficult to accurately capture the spatial and channel information of the input feature map, which leads to the difficulty of sound event localization and detection. An attention-based CECANet(coordinate and efficient channel attention network) network model is proposed. Firstly, a coordinate attention module is introduced into the residual module to make the network model pay more attention to the spatial coordinate information of the feature map, and then an efficient channel attention module is added after the average pooling layer to make the network model pay more attention to the channel information between features. The experimental results show that the proposed network model in the TAU-NIGENS Spatial Sound Events 2021 dataset has an overall improvement in performance compared to the baseline model, with F1 and LR improved to 0.720 and 0.728, and ER and LE reduced to 0.393 and 11.71°.

Key words: sound event localization and detection, attention mechanism, convolutional neural network, deep learning

摘要: 近年来,声音事件定位与检测被广泛应用于各个领域。基于深度学习的声音事件定位与检测的网络模型难以准确捕捉输入特征图的空间和通道信息,从而导致声音事件定位和检测难度较大。提出了一种基于注意力的CECANet(coordinate and efficient channel attention network)网络模型。在残差模块中引入坐标注意力模块,使网络模型更集中关注特征图的空间坐标信息,然后在平均池化层后加入高效通道注意力模块,使网络模型更加关注特征之间的通道信息。实验结果表明,提出的网络模型在TAU-NIGENS Spatial Sound Events 2021数据集中,相较于基线模型性能有整体的提升,F1和LR提升到了0.720和0.728,ER和LE降低到0.393和11.71°。

关键词: 声音事件定位与检测, 注意力机制, 卷积神经网络, 深度学习