[1] MIKOLOV T,CHEN K,CORRADO G,et al.Efficient estimation of word representations in vector space[J].arXiv:1301.3781,2013.
[2] MIKOLOV T,SUTSKEVER I,CHEN K,et al.Distributed representations of words and phrases and their compositionality[C]//Advances in Neural Information Processing Systems,2013:3111-3119.
[3] DEVLIN J,CHANG M W,LEE K,et al.BERT:pre-training of deep bidirectional transformers for language understan- ding[J].arXiv:1810.04805,2018.
[4] RAO T,LI X,ZHANG H,et al.Multi-level region-based convolutional neural network for image emotion classification[J].Neurocomputing,2019,333:429-439.
[5] YANG J,SHE D,SUN M,et al.Visual sentiment prediction based on automatic discovery of affective regions[J].IEEE Transactions on Multimedia,2018,20(9):2513-2525.
[6] SONG K,YAO T,LING Q,et al.Boosting image sentiment analysis with visual attention[J].Neurocomputing,2018,312:218-228.
[7] 刘星.融合局部语义信息的多模态舆情分析模型[J].信息安全研究,2019,5(4):340-345.
LIU X.Multimodal public opinion analysis model integrating local semantic information[J].Information Security Research,2019,5(4):340-345.
[8] 胡慧君,冯梦媛,曹梦丽,等.基于语义相关的多模态社交情感分析[J].北京航空航天大学学报,2021,47(3):469-477.
HU Huijun,FENG Mengyuan,CAO Mengli,et al.Multimodal social sentiment analysis based on semantic correlation[J].Journal of Beijing University of Aeronautics and Astronautics,2021,47(3):469-477.
[9] SHUSTER K,HUMEAU S,HU H,et al.Engaging image captioning via personality[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2019:12516-12526.
[10] VINYALS O,TOSHEV A,BENGIO S,et al.Show and tell:a neural image caption generator[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2015:3156-3164.
[11] HE K,ZHANG X,REN S,et al.Deep residual learning for image recognition[J].arXiv:1512.03385,2015.
[12] LIU Y,OTT M,GOYAL N,et al.RoBERTa:a robustly optimized BERT pretraining approach[J].arXiv:1907. 11692,2019.
[13] NGUYEN D Q,VU T,NGUYEN A T.BERTweet:a pre-trained language model for English Tweets[J].arXiv:2005.
10200,2020.
[14] KHAN Z,FU Y.Exploiting BERT for multimodal target sentiment classification through input space translation[C]//Proceedings of the 29th ACM International Conference on Multimedia,2021:3034-3042.
[15] LU D,NEVES L,CARVALHO V,et al.Visual attention model for name tagging in multimodal social media[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics(Volume 1:Long Papers),2018:1990-1999.
[16] ZHANG Q,FU J,LIU X,et al.Adaptive co-attention network for named entity recognition in tweets[C]//Thirty-Second AAAI Conference on Artificial Intelligence,2018.
[17] YU J,JIANG J.Adapting BERT for target-oriented multimodal sentiment classification[C]//IJCAI,2019.
[18] LIN T Y,MAIRE M,BELONGIE S,et al.Microsoft COCO:common objects in context[C]//European Conference on Computer Vision.Cham:Springer,2014:740-755.
[19] MATHEWS A,XIE L,HE X.SentiCap:generating image descriptions with sentiments[C]//Proceedings of the AAAI Conference on Artificial Intelligence,2016.