[1] GRISHMAN R, SUNDHEIM B. Message understanding conference-6: a brief history[C]//Proceedings of the 16th Conference on Computational Linguistics, 1996: 466.
[2] TJONG KIM SANG E F, DE MEULDER F. Introduction to the CoNLL-2003 shared task: language-independent named entity recognition[C]//Proceedings of the 7th Conference on Natural Language Learning, 2003: 142-147.
[3] 李莉, 奚雪峰, 盛胜利, 等. 深度学习中文命名实体识别研究进展[J]. 计算机工程与应用, 2023, 59(24): 46-69.
LI L, XI X F, SHENG S L, et al. Research progress on named entity recognition in Chinese deep learning[J]. Computer Engineering and Applications, 2023, 59(24): 46-69.
[4] LI J, SUN A X, HAN J L, et al. A survey on deep learning for named entity recognition[J]. IEEE Transactions on Knowledge and Data Engineering, 2022, 34(1): 50-70.
[5] 赵继贵, 钱育蓉, 王魁, 等. 中文命名实体识别研究综述[J]. 计算机工程与应用, 2024, 60(1): 15-27.
ZHAO J G, QIAN Y R, WANG K, et al. Survey of Chinese named entity recognition research[J]. Computer Engineering and Applications, 2024, 60(1): 15-27.
[6] MOON S, NEVES L, CARVALHO V. Multimodal named entity recognition for short social media posts[J]. arXiv:1802.07862, 2018.
[7] XU B, HUANG S Z, SHA C F, et al. MAF: a general matching and alignment framework for multimodal named entity recognition[C]//Proceedings of the 15th ACM International Conference on Web Search and Data Mining. New York: ACM, 2022: 1215-1223.
[8] WANG X Y, GUI M, JIANG Y, et al. ITA: image-text alignments for multi-modal named entity recognition[J]. arXiv:2112.06482, 2021.
[9] WANG P, CHEN X H, SHANG Z Y, et al. Multimodal named entity recognition with bottleneck fusion and contrastive learning[J]. IEICE Transactions on Information and Systems, 2023(4): 545-555.
[10] VASWANI A. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017: 6000-6010.
[11] YU J F, JIANG J, YANG L, et al. Improving multimodal named entity recognition via entity span detection with unified multimodal transformer[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2020: 3342-3352.
[12] ZHANG D, WEI S Z, LI S S, et al. Multi-modal graph fusion for named entity recognition with targeted visual guidance[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2021: 14347-14355.
[13] RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[C]//Proceedings of the International Conference on Machine Learning, 2021: 8748-8763.
[14] TSAI Y H, BAI S, LIANG P, et al. Multimodal Transformer for unaligned multimodal language sequences[C]//Proceedings of the Conference on Association for Computational Linguistics, 2019: 6558-6569.
[15] LAFFERTY J, MCCALLUM A, PEREIRA F. Conditional random fields: probabilistic models for segmenting and labeling sequence data[C]//Proceedings of the 18th International Conference on Machine Learning, 2019 :282-289.
[16] COLLOBERT R, WESTON J, BOTTOU L, et al. Natural language processing (almost) from scratch[J]. Journal of Machine Learning Research, 2011, 12: 2493-2537.
[17] HUANG Z, XU W, YU K. Bidirectional LSTM-CRF models for sequence tagging[J]. arXiv:1508.01991, 2015.
[18] DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional Transformers for language understanding[J]. arXiv:1810.04805, 2018.
[19] CONNEAU A, KHANDELWAL K, GOYAL N, et al. Unsupervised cross-lingual representation learning at scale[J]. arXiv:1911.02116, 2019.
[20] LI X J, SUN G L, LIU X Y. ESPVR: entity spans position visual regions for multimodal named entity recognition[C]//Proceedings of the Association for Computational Linguistics. Stroudsburg:ACL, 2023: 7785-7794.
[21] CHEN X, ZHANG N, LI L, et al. Good visual guidance makes a better extractor: hierarchical visual prefix for multimodal entity and relation extraction[J]. arXiv:2205.03521, 2022.
[22] ZHOU B H, ZHANG Y, SONG K H, et al. A span-based multimodal variational autoencoder for semi-supervised multimodal named entity recognition[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2022: 6293-6302.
[23] LIU L P, WANG M L, ZHANG M Z, et al. UAMNer: uncertainty-aware multimodal named entity recognition in social media posts[J]. Applied Intelligence, 2022, 52(4): 4109-4125.
[24] SUN L, WANG J Q, SU Y D, et al. RIVA: a pre-trained tweet multimodal model based on text-image relation for multimodal NER[C]//Proceedings of the 28th International Committee on Computational Linguistics, 2020: 1852-1862.
[25] JIA M, SHEN L, SHEN X, et al. MNER-QG: an end-to-end MRC framework for multimodal named entity recognition with query grounding[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2023: 8032-8040.
[26] WANG J, YANG Y, LIU K Y, et al. M3S: scene graph driven multi-granularity multi-task learning for multi-modal NER[J]. ACM Transactions on Audio, Speech, and Language Processing, 2023, 31: 111-120.
[27] GONG Y C, LV X Q, YUAN Z, et al. GNN-based multimodal named entity recognition[J]. The Computer Journal, 2024, 67(8): 2622-2632.
[28] ZHANG Z X, CHEN J Y, LIU X J, et al. ‘what’ and ‘where’ both matter: dual cross-modal graph convolutional networks for multimodal named entity recognition[J]. International Journal of Machine Learning and Cybernetics, 2024, 15(6): 2399-2409.
[29] WANG X W, TIAN J F, GUI M, et al. PromptMNER: prompt-based entity-related visual clue extraction and integration for multimodal named entity recognition[C]//Proceedings of the International Conference on Database Systems for Advanced Applications, 2022: 297-305.
[30] LI J, LI H, SUN D, et al. LLMs as bridges: reformulating grounded multimodal named entity recognition[J]. arXiv:2402.09989, 2024.
[31] ALAYRAC J B, DONAHUE J, LUC P, et al. Flamingo: a visual language model for few-shot learning[C]//Advances in Neural Information Processing Systems, 2022: 23716-23736.
[32] LI J, LI D, XIONG C, et al. BLIP: bootstrapping language-image pre-training for unified vision-language understanding and generation[C]//Proceedings of the International Conference on Machine Learning, 2022: 12888-12900.
[33] LI J, LI D, SAVARESE S, et al. BLIP-2: bootstrapping language-image pre-training with frozen image encoders and large language models[C]//Proceedings of the International Conference on Machine Learning, 2023: 19730-19742.
[34] TOUVRON H, LAVRIL T, IZACARD G, et al. LLaMA: open and efficient foundation language models[J]. arXiv:2302.13971, 2023.
[35] SANG E F, VEENSTRA J. Representing text chunks[C]//Proceedings of the 9th Conference of the European Chapter of the Association for Computational Linguistics, 1999: 173-179.
[36] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 770-778.
[37] ZHANG Q, FU J L, LIU X Y, et al. Adaptive co-attention network for named entity recognition in tweets[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2018: 5674-5681.
[38] LU D, NEVES L, CARVALHO V, et al. Visual attention model for name tagging in multimodal social media[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2018: 1990-1999.
[39] MA X, HOVY E. End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF[J]. arXiv:1603.01354, 2016.
[40] SOUZA F, NOGUEIRA R, LOTUFO R. BERTimbau: pretrained BERT models for Brazilian Portuguese[C]//Proceedings of the 9th Brazilian Conference on Intelligent Systems, 2020: 403-417.
[41] LIU P P, WANG G S, LI H, et al. Multi-granularity cross-modal representation learning for named entity recognition on social media[J]. Information Processing & Management, 2024, 61(1): 103546. |