Computer Engineering and Applications ›› 2023, Vol. 59 ›› Issue (3): 94-103.DOI: 10.3778/j.issn.1002-8331.2206-0113

• Pattern Recognition and Artificial Intelligence • Previous Articles     Next Articles

Music Emotion Recognition Fusion on CNN-BiLSTM and Self-Attention Model

ZHONG Zhipeng, WANG Hailong, SU Guibin, LIU Lin, PEI Dongmei   

  1. College of Computer Science and Technology, Inner Mongolia Normal University, Hohhot 010022, China
  • Online:2023-02-01 Published:2023-02-01

融合CNN-BiLSTM和自注意力模型的音乐情感识别

钟智鹏,王海龙,苏贵斌,柳林,裴冬梅   

  1. 内蒙古师范大学 计算机科学技术学院,呼和浩特 010022

Abstract: With the development of music science and technology, music emotion recognition has been widely practiced and applied in music recommendation, music psychotherapy, sound and light scene construction, and so on. This paper simulates the process of human beings expressing emotions in music. Aiming at the long-distance dependence and low training efficiency of long-term and short-term memory neural networks in music emotion recognition, a new network model CBSA(CNN BiLSTM Self Attention) is proposed in applying to music emotion recognition regression training. The model uses a two-dimensional convolution neural network to obtain the local key features of music emotion, uses bidirectional long-term and short-term memory neural network to extract the serialized music emotion information from the obtained local key features, and uses the self-attention model to dynamically adjust the weight of the serialized information to highlight the global key points of music emotion. The experimental results show that the CBSA model can shorten the training time of analyzing the data rules in music emotion information, and effectively improve the accuracy of music emotion recognition.

Key words: music emotion recognition, two-dimensional convolutional neural network, bidirectional long and short-term memory neural network, self-attention model

摘要: 随着音乐科技研究的不断深入,音乐情感识别已被广泛实践和应用在音乐推荐、音乐心理治疗、声光场景构建等方面。模拟人类感受音乐表现情感的过程,针对音乐情感识别中长短时记忆神经网络的长距离依赖和训练效率低的问题,提出一种新的网络模型CBSA(CNN BiLSTM self attention),应用于长距离音乐情感识别回归训练。模型使用二维卷积神经网络获取音乐情感局部关键特征,采用双向长短时记忆神经网络从获取的局部关键特征中提取序列化音乐情感信息,利用自注意力模型对获取的序列化信息进行动态权重调整,突出音乐情感全局关键点。实验结果表明,CBSA模型可缩短分析音乐情感信息中数据规律的训练时间,有效地提高音乐情感识别精确度。

关键词: 音乐情感识别, 二维卷积神经网络, 双向长短时记忆神经网络, 自注意力模型