计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (2): 34-47.DOI: 10.3778/j.issn.1002-8331.2205-0492
裴文斌,王海龙,柳林,裴冬梅
出版日期:
2023-01-15
发布日期:
2023-01-15
PEI Wenbin, WANG Hailong, LIU Lin, PEI Dongmei
Online:
2023-01-15
Published:
2023-01-15
摘要: 高效精准的乐器识别技术可以有效地推动声源分离、音乐识谱、音乐流派分类等研究的深入发展,可广泛应用于播放列表生成、声学环境分类、乐器智能教学和交互式多媒体等众多领域。近年来,随着乐器识别研究的不断推进,乐器识别系统在性能上有了大幅提高,但依旧存在着部分乐器难以识别、乐器音频特征提取较为困难、复音乐器识别精准度较低等诸多问题,如何借助人工智能技术对乐器进行高效精准的识别成为当前研究的热点和难点。针对当前研究现状,从乐器识别常用音频特征、乐器识别模型及方法和常用数据集三个方面进行综述,并对当前研究中存在的局限性和未来发展趋势进行总结,为乐器识别研究提供一定的借鉴参考。
裴文斌, 王海龙, 柳林, 裴冬梅. 音乐信息检索下的乐器识别综述[J]. 计算机工程与应用, 2023, 59(2): 34-47.
PEI Wenbin, WANG Hailong, LIU Lin, PEI Dongmei. Review of Musical Instrument Recognition in Music Information Retrieval[J]. Computer Engineering and Applications, 2023, 59(2): 34-47.
[1] 李伟,李子晋,邵曦.音频音乐与计算机的交融——音频音乐技术[M].上海:复旦大学出版社,2019:1-246. LI W,LI Z J,SHAO X.The blending of audio music and computer--audio music technology[M].Shanghai:Fudan University Press,2019:1-246. [2] AUCOUTURIER J J,PACHET F.Scaling up music playlist generation[C]//Proceedings of IEEE International Conference on Multimedia and Expo,2002. [3] XIONG Z,RADHAKRISHNAN R,DIVAKARAN A,et al.Comparing MFCC and MPEG-7 audio features for feature extraction,maximum likelihood HMM and entropic prior HMM for sports audio classification[C]//IEEE International Conference on Acoustics,2003:397-400. [4] LING M,MILNER B P,DAN S.Acoustic environment classification[J].ACM Transactions on Speech and Language Processing,2006,3(2):1-22. [5] 李洁琼.钢琴智能化教学“智”在何方[D].北京:中国音乐学院,2019. LI J Q.Where is “intellect” in the intelligent teaching of piano[D].Beijing:China Conservatory of Music,2019. [6] KURNIA Y,SILAEN T P.Android-based musical instrument recognition application for vocational high school level[J].Bit-Tech,2021,4(2):47-55. [7] DIVAKARAN A,REGUNATHAN R,XIONG Z,et al.Procedure for audio-assisted browsing of news video using generalized sound recognition[C]//Storage and Retrieval for Media Databases 2003,Santa Clara,CA,USA,2003. [8] ERONEN A.Comparison of features for musical instrument recognition[C]//IEEE Workshop on Applications of Signal Processing to Audio & Acoustics,2002. [9] DENG J D,SIMMERMACHER C,CRANEFIELD S.A study on feature analysis for musical instrument classification[J].IEEE Transactions on Cybernetics,2008,38(2):429-438. [10] WEESE J L.A convolutive model for polyphonic instrument identification and pitch detection using combined classification[D].Kansas State University,2013. [11] 沈骏,胡荷芬.中国民族乐器的特征值提取和分类[J].计算机与数字工程,2012,40(9):119-121. SHEN J,HU H F.Audio feature extraction and classification of the Chinese national instrument[J].Computer and Digital Engineering,2012,40(9):119-121. [12] YANG H J,LAY Y L,LIN C S.Automatic timbre quality evaluation in Chinese traditional flute industry[J].Expert Systems with Applications,2007,32(4):1004-1010. [13] 旷玮,姬培锋,杨军.笙的簧片物理参数与音色相关性的初步研究[J].应用声学,2016,35(6):494-504. KUANG W,JI P F,YANG J.A study of the relationship between the physical parameters of Sheng reed and the timbre[J].Journal of Applied Acoustics,2016,35(6):494-504. [14] TSAI C G.Relating the harmonic-rich sound of the Chinese flute (dizi) to the cubic nonlinearity of its membrane[J].Journal of the Acoustical Society of America,2012,131(4):3296. [15] DAVIS S,MERMELSTEIN P.Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences[J].IEEE Transactions on Acoustics Speech & Signal Processing,1980. [16] ERONEN A.Musical instrument recognition using ICA-based transform of features and discriminatively trained HMMs[C]//International Symposium on Signal Processing & Its Applications,2003. [17] MORVIDONE M,STURM R L,DAUDET R.Incorporating scale information with cepstral features:experiments on musical instrument recognition[J].Pattern Recognition Letters,2010,31(12):1489-1497. [18] STURM B L,MORVIDONE M,DAUDET L.Musical instrument identification using multiscale mel-frequency cepstral coefficients[C]//2010 18th European Signal Processing Conference,2010:477-481. [19] MAHANTA S K,KHILJI A F U R,PAKRAY P.Deep neural network for musical instrument recognition using MFCCs[J].arXiv:2105.00933,2021. [20] 韩纪庆,张磊,郑铁然.语音信号处理[M].北京:清华大学出版社,2019:97-121. HAN J Q,ZHANG L,ZHENG T R.Speech signal processing[M].Beijing:Tsinghua University Press,2019:97-121. [21] SCHWARZ D,RODET X.Spectral envelope estimation and representation for sound analysis-synthesis[C]//Proceedings of ICMC,1999. [22] AUCOUTURIER J J,SANDLER M.Segmentation of musical signals using hidden MARkov models[C]//Proc Convention of the Audio Engineering Society,2012. [23] ERONEN A.Comparison of features for musical instrument recognition[C]//IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics,2001:19-22. [24] KRISHNA A G,SREENIVAS T V.Music instrument recognition:from isolated notes to solo phrases[C]//IEEE International Conference on Acoustics,Speech,and Signal Processing,2004. [25] DUAN Z,PARDO B A,DAUDET L.A novel cepstral representation for timbre modeling of sound sources in polyphonic mixtures[C]//International Conference on Acoustics,Speech,and Signal Processing,2014. [26] YU L F,SU L,YANG Y H.Sparse cepstral codes and power scale for instrument identification[C]//2014 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP),2014. [27] HAN Y,LEE S,NAM J,et al.Sparse feature learning for instrument identification:effects of sampling and pooling methods[J].The Journal of the Acoustical Society of America,2016,139(5):2290-2298. [28] 岳琪,徐忠亮,郭继峰.面向混合乐器音乐分析的稀疏特征提取方法[J].计算机工程与应用,2021,57(14):181-186. YUE Q,XU Z L,GUO J F.Sparse feature extraction method for mixed instruments music analysis[J].Computer Engineering and Applications,2021,57(14):181-186. [29] HU Y,LIU G.Instrument identification and pitch estimation in multi-timbre polyphonic musical signals based on probabilistic mixture model decomposition[J].Journal of Intelligent Information Systems,2013,40(1):141-158. [30] 郅逍遥,李临生,郭喆,等.基于相空间重构和柔性神经树的乐器分类[J].计算机应用与软件,2015(2):159-162. ZHI X Y,LI L S,GUO Z,et al.Musical instruments classification based on phase space reconstruction and flexible neural trees[J].Computer Applications and Software,2015(2):159-162. [31] KASHINO K,MURASE H.A sound source identification system for ensemble music based on template adaptation and music stream extraction[J].Speech Communication,1999,27(3/4):337-349. [32] KINOSHITA T,SAKAI S,TANAKA H.Musical sound source identification based on frequency component adaptation[J].Proc IJCAI Workshop on Computational Auditory Scene Analysis,1999. [33] EGGINK J,BROWN G J.A missing feature approach to instrument identification in polyphonic music[C]//IEEE International Conference on Acoustics,2003. [34] KITAHARA T,GOTO M,KOMATANI K,et al.Musical instrument recognizer "Instrogram" and its application to music retrieval based on Instrumentation Similarity[C]//Eigth IEEE International Symposium on Multimedia(ISM 2006),San Diego,CA,USA,2006. [35] ESSID S,RICHARD G,DAVID B.Musical instrument recognition by pairwise classification strategies[J].IEEE Transactions on Audio,Speech,and Language Processing,2006,14(4):1401-1412. [36] 孙聪珊,杨婧,马琳,等.基于离散谐波变换的西洋乐器音色特征提取方法[J].复旦学报(自然科学版),2020,59(5):531-539. SUN C S,YANG J,MA L,et al.Timbre feature extraction of western musical instrument based on discrete harmonic transform[J].Journal of Fudan University(Natural Science),2020,59(5):531-539. [37] 黄雪梅,闫坤,李亮,等.基于递归图的乐器识别算法[J].传感器与微系统,2020,39(11):144-147. HUANG X M,YAN K,LI L,et al.Instrument recognition algorithm based on recurrence plot[J].Transducer and Microsystem Technologies,2020,39(11):144-147. [38] CHAUDHARY S,KAKARWAL S,DESHMUKH R.Musical instrument recognition using audio features with integrated entropy method[J].Journal of Integrated Science and Technology,2021,9(2):92-97. [39] 王飞,于凤芹.结合多尺度时频调制与多线性主成分分析的乐器识别[J].计算机应用,2018,38(3):891-894. WANG F,YU F Q.Musical instrument recognition based on multiscale time-frequency modulation and multilinear principal component analysis[J].Journal of Computer Applications,2018,38(3):891-894. [40] MARQUES J.An automatic annotation system for audio data containing music[D].Cambridge,MA:Massachussetts Institute of Technology,1999. [41] AGOSTINI G,LONGARI M,POLLASTRI E.Musical instrument timbres classification with spectral features[C]//IEEE Fourth Workshop on Multimedia Signal Processing,2003. [42] GULHANE S R,SHIRBAHADURKAR S D,BADHE S.KNN-a machine learning approach to recognize a musical instrument[J].International Journal of Advance Research,Ideas and Innovations in Technology,2017(6):707-710. [43] CASEY M A.General sound classification and similarity in MPEG-7[J].Organized Sound,2001,6(2):153-164. [44] HAN Y,KIM J,LEE K,et al.Deep convolutional neural networks for predominant instrument recognition in polyphonic music[J].IEEE/ACM Transactions on Audio,Speech and Language Processing,2017,25(1):208-221. [45] 俞冬妍.基于深度学习的主乐器识别方法研究[D].成都:电子科技大学,2020. YU D Y.Research on predominant instrument recognition based on deep learning[D].Chengdu:University of Electronic Science and Technology of China,2020. [46] PARK T,LEE T.Musical instrument sound classification with deep convolutional neural network using feature fusion approach[J].arXiv:1512.07370,2015. [47] 王飞.基于音色分析与深度学习的乐器识别方法研究[D].无锡:江南大学,2018. WANG F.Musical instrument identification based on deep learningandtimbre analysis[D].Wuxi:Jiangnan University,2018. [48] 赵庆磊,邵峰晶,孙仁诚,等.乐器识别中频谱特征与聚合策略性能评估[J].青岛大学学报(自然科学版),2021,34(2):38-44. ZHAO Q L,SHAO F J,SUN R C,et al.Performance evaluation of spectrum features and aggregation strategies for musical instrument recognition[J].Journal of Qingdao University(Natural Science Edition),2021,34(2):38-44. [49] TAENZER M,ABEER J,MIMILAKIS S I,et al.Investigating CNN-based instrument family recognition for western classical music recordings[C]//International Society for Music Information Retrieval,2019. [50] GHARIB S,DROSSOS K,?AKIR E,et al.Unsupervised adversarial domain adaptation for acoustic scene classification[J].arXiv:1808.05777,2018. [51] DIELEMAN S,SCHRAUWEN B.End-to-end learning for music audio[C]//IEEE International Conference on Acoustics,Speech and Signal Processing,2014. [52] LI P,QIAN J,WANG T.Automatic instrument recognition in polyphonic music using convolutional neural networks[J].arXiv:1511.05520,2015. [53] HOSHEN Y,WEISS R J,WILSON K W.Speech acoustic modeling from raw multichannel waveforms[C]//2015 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP),2015:4624-4628. [54] PALAZ D,COLLOBERT R,DOSS M M.Estimating phoneme class conditional probabilities from raw speech signal using convolutional neural networks[J].arXiv:1304. 1018,2013. [55] 李荣光.基于卷积神经网络的音乐信号多乐器识别方法研究[D].广州:华南理工大学,2019. LI R G.Research on multi-instrument recognition method of music signal based on convolutional neural network[D].Guangzhou:South China University of Technology,2019. [56] KRATIMENOS A,AVRAMIDIS K,GAROUFIS C,et al.Augmentation methods on monophonic audio for instrument classification in polyphonic music[C]//2020 28th European Signal Processing Conference(EUSIPCO),2021:156-160. [57] GURURANI S,SHARMA M,LERCH A.An attention mechanism for musical instrument recognition[J].arXiv:1907.04294,2019. [58] WATCHARASUPAT K,GURURANI S,LERCH A.Visual attention for musical instrument recognition[J].arXiv:2006.09640,2020. [59] TAENZER M,MIMILAKIS S I,ABE?ER J.Deep learning-based music instrument recognition.exploring learned feature representations[C]//15th International Symposium on CMMR,2021. [60] REGHUNATH L C,RAJAN R.Transformer-based ensemble method for multiple predominant instruments recognition in polyphonic music[J].EURASIP Journal on Audio,Speech,and Music Processing,2022,2022(1):1-14. [61] SHI X,COOPER E,YAMAGISHI J.Use of speaker recognition approaches for learning and evaluating embedding representations of musical instrument sounds[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2022,30:367-377. [62] GURURANI S,SUMMERS C,LERCH A.Instrument activity detection in polyphonic music using deep neural networks[C]//Proceedings of the International Society for Music Information Retrieval Conference(ISMIR 2019),2019. [63] 李子晋,蒋超亚,陈晓鸥,等.基于卷积循环神经网络的中国民族复音音乐的乐器活动检测[J].复旦学报(自然科学版),2020,59(5):511-516. LI Z J,JIANG C Y,CHEN X O,et al.Instrument activity detection of China national polyphonic music based on convolutional recurrent neural network[J].Journal of Fudan University(Natural Science Edition),2020,59(5):511-516. [64] ABE?ER J,CHAUHAN J,PILLAI P P,et al.Predominant Jazz instrument recognition:empirical studies on neural network architectures[C]//2021 29th European Signal Processing Conference(EUSIPCO),2021:361-365. [65] LEKSHMI C R,RAJAN R.Predominant instrument recognition in polyphonic music using convolutional recurrent neural networks[C]//15th International Symposium on CMMR,2021. [66] GOTO M,HASHIGUCHI H,NISHIMURA T,et al.RWC music database:popular,classical,and Jazz music database[C]//International Conference on Music Information Retrieval,2002:35-42. [67] GOTO M.RWC music database[EB/OL].[2020-03-10].http://staff.aist.go.jp/m.goto/RWC-MDB/. [68] BITTNE R,SALAMON J,TIERNEY M,et al.MedleyDB:a multitrack dataset for annotation-intensive MIR research[C]//15th International Society for Music Information Retrieval Conference,2014. [69] BITTNER R M,WILKINS J,YIP H,et al.MedleyDB 2.0:new data and a system for sustainable data collection[C]//ISMIR Late Breaking and Demo Papers,2016. [70] FUHRMANN F.IRMAS database[EB/OL].[2020-03-10].https://www.upf.edu/web/mtg/irmas. [71] THICKSTUN J,HARCHAOUI Z,KAKADE S.Learning features of music from scratch[J].arXiv:1611.09827,2016. [72] THICKSTUN J.Musicnet database[EB/OL].[2020-03-10].https://homes.cs.washington.edu/~thickstn/musicnet.html. [73] HUMPHREY E.OpenMIC database[EB/OL].[2020-03-10].https://github.com/cosmir/open-mic-data. [74] HUMPHREY E,DURAND S,MCFEE B.OpenMIC—2018:an open data-set for multiple instrument recognition[C]//International Society for Music Information Retrieval Conference,2018:438-444. [75] 李子晋,韩宝强.中国传统乐器音响数据库构建研究[J].中国音乐学,2020(2):92-102. LI Z J,HAN B Q.Research on the construction of sound database of Chinese traditional musical instruments[J].Musicology in China,2020(2):92-102. [76] GONG X,ZHU Y,ZHU H,et al.ChMusic:a traditional Chinese music dataset for evaluation of instrument recognition[C]//ICBDT 2021 4th International Conference on Big Data Technologies.New York:Springer,2021:184-189. [77] 巩霞,姚泽炜,魏浩然.基于人工智能技术的中国民族乐器识别研究[J].山东理工大学学报(社会科学版),2022,38(1):108-112. GONG X,YAO Z W,WEI H R.Research on Chinese national musical instrument recognition based on artificial intelligence technology[J].Journal of Shandong University of Technology(Social Science Edition),2022,38(1):108-112. [78] BANDIERA G,PICAS O R,TOKUDA H,et al.Good-sounds.org:a framework to explore goodness in instrumental sounds[C]//International Society for Music Information Retrieval Conference(ISMIR 2016),2016. [79] ENGEL J,RESNICK C,ROBERTS A,et al.Neural audio synthesis of musical notes with WaveNet autoencoders[C]//International Conference on Machine Learning,2017. [80] BOSCH J J,JANER J,FUHRMANN F,et al.A comparison of sound segregation techniques for predominant instrument recognition in musical audio signals[C]//International Society for Music Information Retrieval(ISMIR2012),2012. [81] FUHRMANN F,HERRERA P.Polyphonic instrument recognition for exploring semantic similarities in music[C]//Proc of 13th Int Conference on Digital Audio Effects DAFx10,2010:1-8. [82] YOSHII K,GOTO M,OKUNO H G.Drum sound recognition for polyphonic audio signals by adaptation and matching of spectrogram templates with harmonic structure suppression[J].IEEE Transactions on Audio,Speech,and Language Processing,2006,15(1):333-345. [83] 蒲亨强.中国音乐通论[M].南京:南京大学出版社,2005:229-230. PU H Q.General theory of Chinese music[M].Nanjing:Nanjing University Press,2005:229-230. |
[1] | 淦亚婷, 安建业, 徐雪. 基于深度学习的短文本分类方法研究综述[J]. 计算机工程与应用, 2023, 59(4): 43-53. |
[2] | 徐东东, 蔡肖红, 刘静, 曹慧. 社交媒体文本数据的抑郁症检测研究综述[J]. 计算机工程与应用, 2023, 59(4): 54-63. |
[3] | 杨坤融, 熊余, 张健, 储雯. 面向长短期混合数据的MOOC辍学预测策略研究[J]. 计算机工程与应用, 2023, 59(4): 130-138. |
[4] | 李玲, 郭广颂. 融合指标分组的高维混合多目标进化优化[J]. 计算机工程与应用, 2023, 59(4): 165-174. |
[5] | 胡欣珏, 付章杰. 高图像质量的一图藏两图方法[J]. 计算机工程与应用, 2023, 59(4): 235-242. |
[6] | 杨寒雨, 赵晓永, 王磊. 数据归一化方法综述[J]. 计算机工程与应用, 2023, 59(3): 13-22. |
[7] | 陈晓婷, 李实. 对话情绪识别综述[J]. 计算机工程与应用, 2023, 59(3): 33-48. |
[8] | 杜昱峥, 曹慧, 聂永琦, 魏德健, 冯妍妍. 深度学习在阿尔茨海默病分类诊断中的应用[J]. 计算机工程与应用, 2023, 59(3): 49-65. |
[9] | 林鸿辉, 刘建华, 郑智雄, 胡任远, 罗逸轩. 联合对话行为识别与情感分类的多任务网络[J]. 计算机工程与应用, 2023, 59(3): 104-111. |
[10] | 丁上上, 郑田莉, 姚康, 张贺童, 裴融浩, 付威威. 深度学习屈光检测方法研究[J]. 计算机工程与应用, 2023, 59(3): 193-201. |
[11] | 张冬冬, 郭杰, 陈阳. 基于原始点云的三维目标检测算法[J]. 计算机工程与应用, 2023, 59(3): 209-217. |
[12] | 张晗, 郑伟昊, 窦志成, 文继荣. 融合法律文本结构信息的刑事案件判决预测[J]. 计算机工程与应用, 2023, 59(3): 253-263. |
[13] | 林令德, 刘纳, 王正安. Adapter与Prompt Tuning微调方法研究综述[J]. 计算机工程与应用, 2023, 59(2): 12-21. |
[14] | 潘梦竹, 李千目, 邱天. 深度多模态表示学习的研究综述[J]. 计算机工程与应用, 2023, 59(2): 48-64. |
[15] | 韦世红, 刘红梅, 唐宏, 朱龙娇. 多级度量网络的小样本学习[J]. 计算机工程与应用, 2023, 59(2): 94-101. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||