[1] LAKOFF G, JOHNSON M. Metaphors we live by[M]. Chicago: University of Chicago Press, 2003.
[2] FASS D. Met*: a method for discriminating metonymy and metaphor by computer[J]. Computational Linguistics, 1991, 17(1): 49-90.
[3] SHUTOVA E, KIELA D, MAILLARD J. Black holes and white rabbits: metaphor identification with visual features[C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL, 2016: 160-170.
[4] SU C, CHEN W J, FU Z, et al. Multimodal metaphor detection based on distinguishing concreteness[J]. Neurocomputing, 2021, 429: 166-173.
[5] KEHAT G, PUSTEJOVSKY J. Improving neural metaphor detection with visual datasets[C]//Proceedings of the 12th International Conference on Language Resources and Evaluation, 2020: 5928-5933.
[6] XU B, LI T T, ZHENG J Z, et al. MET-Meme: a multimodal meme dataset rich in metaphors[C]//Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2022: 2887-2899.
[7] CAI Y T, CAI H Y, WAN X J. Multi-modal sarcasm detection in Twitter with hierarchical fusion model[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2019: 2506-2515.
[8] HEINTZ I, GABBARD R, SRIVASTAVA M, et al. Automatic extraction of linguistic metaphors with LDA topic modeling[C]//Proceedings of the 1st Workshop on Metaphor in NLP, 2013: 58-66.
[9] K?PER M, SCHULTE IM WALDE S. Improving verb metaphor detection by propagating abstractness to words, phrases and individual senses[C]//Proceedings of the 1st Workshop on Sense, Concept and Entity Representations and Their Applications. Stroudsburg: ACL, 2017: 24-30.
[10] STRZALKOWSKI T, BROADWELL G A, TAYLOR S, et al. Robust extraction of metaphor from novel data[C]//Proceedings of the 1st Workshop on Metaphor in NLP, 2013: 67-76.
[11] SHUTOVA E, SUN L, GUTIéRREZ E D, et al. Multilingual metaphor processing: experiments with semi-supervised and unsupervised learning[J]. Computational Linguistics, 2017, 43(1): 71-123.
[12] MAO R, LIN C H, GUERIN F. Word embedding and WordNet based metaphor identification and interpretation[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2018: 1222-1231.
[13] PRAMANICK M, MITRA P. Unsupervised detection of metaphorical adjective-noun pairs[C]//Proceedings of the 2018 Workshop on Figurative Language Processing. Stroudsburg: ACL, 2018: 76-80.
[14] REI M, BULAT L, KIELA D, et al. Grasping the finer point: a supervised similarity network for metaphor detection[J]. arXiv:1709.00575, 2017.
[15] BIZZONI Y, GHANIMIFARD M. Bigrams and BiLSTMs two neural networks for sequential metaphor detection[C]//Proceedings of the 2018 Workshop on Figurative Language Processing. Stroudsburg: ACL, 2018: 91-101.
[16] TANASESCU C, KESARWANI V, INKPEN D. Metaphor detection by deep learning and the place of poetic metaphor in digital humanities[C]//Proceedings of the 31st International Florida Artificial Intelligence Research Society Conference, 2018: 122-127.
[17] FORCEVILLE C. Multimodal metaphor[M]. Berlin: Mouton de Gruyter, 2009.
[18] VASWANI A. Attention is all you need[C]//Advances in Neural Information Processing Systems 30, 2017: 5998-6008.
[19] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[J]. arXiv:1409. 1556, 2014.
[20] KUMAR S, KULKARNI A, AKHTAR M S, et al. When did you become so smart, oh wise one?! sarcasm explanation in multi-modal multi-party dialogues[J]. arXiv:2203.06419, 2022.
[21] DEVLIN J. BERT: pre-training of deep bidirectional transformers for language understanding[J]. arXiv:1810.04805, 2018.
[22] SUN L C, LIAN Z, LIU B, et al. Efficient multimodal transformer with dual-level feature restoration for robust multimodal sentiment analysis[J]. IEEE Transactions on Affective Computing, 2024, 15(1): 309-325.
[23] LEI BA J, KIROS J R, HINTON G E. Layer normalization[J]. arXiv:1607.06450, 2016.
[24] PASZKE A, GROSS S, MASSA F, et al. Pytorch: an imperative style, high-performance deep learning library[C]//Adv-ances in Neural Information Processing Systems 32, 2019.
[25] KINGMA D P. Adam: a method for stochastic optimization[J]. arXiv:1412.6980, 2014.
[26] LEWIS M. BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension[J]. arXiv:1910.13461, 2019.
[27] CLARK K. Electra: pre-training text encoders as discriminators rather than generators[J]. arXiv:2003.10555, 2020.
[28] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 770-778.
[29] CHEN X, ZHANG N Y, LI L, et al. Hybrid transformer with multi-level fusion for multimodal knowledge graph completion[C]//Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2022: 904-915.
[30] TSAI Y H, BAI S J, PU LIANG P, et al. Multimodal transformer for unaligned multimodal language sequences[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2019: 6558-6569.
[31] YANG B, SHAO B, WU L J, et al. Multimodal sentiment analysis with unidirectional modality translation[J]. Neurocomputing, 2022, 467: 130-137.
[32] XU N, ZENG Z X, MAO W J. Reasoning with multimodal sarcastic tweets via modeling cross-modality contrast and semantic association[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2020: 3777-3786.
[33] PAN H L, LIN Z, FU P, et al. Modeling intra and inter-modality incongruity for multi-modal sarcasm detection[C]//Findings of the Association for Computational Linguistics: EMNLP 2020. Stroudsburg: ACL, 2020: 1383-1392.
[34] WANG X Y, SUN X W, YANG T, et al. Building a bridge: a method for image-text sarcasm detection without pretraining on image-text data[C]//Proceedings of the 1st International Workshop on Natural Language Processing Beyond Text. Stroudsburg: ACL, 2020: 19-29.
[35] RAFFEL C, SHAZEER N M, ROBERTS A, et al. Exploring the limits of transfer learning with a unified text-to-text transformer[J]. Journal of Machine Learning Research, 2020, 21: 140.
[36] DOSOVITSKIY A. An image is worth 16x16 words: transformers for image recognition at scale[J]. arXiv:2010.11929, 2020.
[37] TOUVRON H, CORD M, DOUZE M, et al. Training data-efficient image transformers & distillation through attention[C]//Proceedings of the 37th International Conference on Machine Learning, 2020: 10347-10357.
[38] RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[C]//Proceedings of the 38th International Conference on Machine Learning, 2021: 8748-8763.
[39] KIM W, SON B, KIM I. Vilt: vision-and-language transformer without convolution or region supervision[C]//Proceedings of the 38th International Conference on Machine Learning, 2021: 5583-5594. |