计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (22): 18-37.DOI: 10.3778/j.issn.1002-8331.2404-0285
姚奕,陈朝阳,杜晓明,姚天磊,李青尚,孙鸣蔚
出版日期:
2024-11-15
发布日期:
2024-11-14
YAO Yi, CHEN Zhaoyang, DU Xiaoming, YAO Tianlei, LI Qingshang, SUN Mingwei
Online:
2024-11-15
Published:
2024-11-14
摘要: 随着数据资源类型的丰富与大模型技术的发展,能够处理多源异构数据的多模态知识图谱(multimodal knowledge graph,MMKG)以出色的数据处理与管理能力而被广泛关注。结合领域需求与特性,对多模态知识图谱构建技术及其在军事领域的应用展开总体概述。基于传统文本知识图谱的相关概念,对多模态知识图谱的基本概念、研究现状进行梳理,分析总结了多模态信息抽取、多模态实体链接、多模态表示学习三个多模态知识图谱构建的关键技术,以及大模型技术在多模态知识图谱构建过程中的应用,探讨了多模态知识图谱在军事领域中的应用场景。最后结合大模型热点和军事需求,对多模态知识图谱构建技术的发展前景及军事应用进行总结与展望。
姚奕, 陈朝阳, 杜晓明, 姚天磊, 李青尚, 孙鸣蔚. 多模态知识图谱构建技术及其在军事领域的应用综述[J]. 计算机工程与应用, 2024, 60(22): 18-37.
YAO Yi, CHEN Zhaoyang, DU Xiaoming, YAO Tianlei, LI Qingshang, SUN Mingwei. Survey of Multimodal Knowledge Graph Construction Technology and Its Application in Military Field[J]. Computer Engineering and Applications, 2024, 60(22): 18-37.
[1] 谢作如. 用PyWebIO“交互”呈现人工智能学习成果[J]. 中国信息技术教育, 2021(15): 82-84. XIE Z R. Using PyWebIO “interaction” to present artificial intelligence learning results[J]. Chinese Information Technology Education, 2021(15): 82-84. [2] SINGHAL A. Introducing the knowledge graph: things, not strings[EB/OL]. [2021-11-19]. https://blog.google/products/search/introducing-knowledge-graph-things-not/. [3] 李涓子, 侯磊. 知识图谱研究综述[J]. 山西大学学报 (自然科学版), 2017, 40(3): 454-459. LI J Z, HOU L. Reviews on knowledge graph research[J]. Journal of Shanxi University (Natural Science Edition), 2017, 40(3): 454-459. [4] O'CALLAGHAN J. How OpenAI’s text-to-video tool Sora could change science-and society[J]. Nature, 2024, 627(8004): 475-476. [5] XU G, CHEN H, LI F L, et al. AliME MKG: a multi-modal knowledge graph for live-streaming e-commerce[C]//Proceedings of the 30th ACM International Conference on Information and Knowledge Management, 2021: 4808-4812. [6] ZHA Z, WANG J, LI Z, et al. M2ConceptBase: a fine-grained aligned multi-modal conceptual knowledge base[J]. arXiv:2312. 10417, 2023. [7] ZHU X, LI Z, WANG X, et al. Multi-modal knowledge graph construction and application: a survey[J]. IEEE Transactions on Knowledge and Data Engineering, 2024, 36(2): 715-735. [8] PENG J, HU X, HUANG W, et al. What is a multi-modal knowledge graph: a survey[J]. Big Data Research, 2023, 32: 100380. [9] CHEN Y, GE X, YANG S, et al. A survey on multimodal knowledge graphs: construction, completion and applications[J]. Mathematics, 2023, 11(8): 1815. [10] 陈烨, 周刚, 卢记仓. 多模态知识图谱构建与应用研究综述[J]. 计算机应用研究, 2021, 38(12): 3535-3543. CHEN Y, ZHOU G, LU J C. Survey on construction and application research for multi-modal knowledge graphs[J]. Application Research of Computers, 2021, 38(12): 3535-3543. [11] 陈佳云, 徐向英, 章永龙, 等. 多模态知识图谱在农业中的研究进展[J]. 农业大数据学报, 2022, 4(3): 126-134. CHEN J Y, XU X Y, ZHANG Y L, et al. Research progress of multimodal knowledge graph in agriculture[J]. Journal of Agricultural Big Data, 2022, 4(3): 126-134. [12] WANG M, QI G, WANG H F, et al. RichPedia: a comprehensive multi-modal knowledge graph[C]//Proceedings of the 9th Joint International Conference on Semantic Technology, Hangzhou, Nov 25-27, 2019: 130-145. [13] DENG J, DONG W, SOCHER R, et al. ImageNet: a large-scale hierarchical image database[C]//Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009: 248-255. [14] LEHMANN J, ISELE R, JAKOB M, et al. DBPedia—a large-scale, multilingual knowledge base extracted from WikiPedia[J]. Semantic Web, 2015, 6(2): 167-195. [15] WU Y, WU X, LI J, et al. MMPedia: a large-scale multi-modal knowledge graph[C]//Proceedings of the 2023 International Semantic Web Conference. Cham: Springer, 2023: 18-37. [16] ZHANG J, WANG J, WANG X, et al. AspectMMKG: a multi-modal knowledge graph with aspect-aware entities[C]//Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, 2023: 3361-3370. [17] XU B, XU Y, LIANG J, et al. CN-DBpedia: a never-ending Chinese knowledge extraction system[C]//Proceedings of the 2017 International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems. Cham: Springer, 2017: 428-438. [18] FERRADA S, BUSTOS B, HOGAN A. IMGPedia: a linked dataset with content-based analysis of WikiMedia images[C]//Proceedings of the 16th International Semantic Web Conference, Vienna, Oct 21-25, 2017. Cham: Springer, 2017: 84-93. [19] O?ORO-RUBIO D, NIEPERT M, GARCíA-DURáN A, et al. Answering visual-relational queries in web-extracted knowledge graphs[J]. arXiv:1709.02314, 2017. [20] ZHANG N, LI L, CHEN X, et al. Multimodal analogical reasoning over knowledge graphs[J]. arXiv:2210.00312, 2022. [21] 李华昱, 付亚凤, 闫阳, 等. 基于LEBERT的多模态领域知识图谱构建[J]. 计算机系统应用, 2022, 31(11): 79-90. LI H Y, FU Y F, YAN Y et al. Construction of multi-modal domain knowledge map based on LEBERT[J]. Computer Systems & Applications, 2022, 31(11): 79-90. [22] ZHENG S, WANG W, QU J, et al. MMKGR: multi-hop multi-modal knowledge graph reasoning[C]//Proceedings of the 2023 IEEE 39th International Conference on Data Engineering, 2023: 96-109. [23] XU G, MENG Y, QIU X, et al. Sentiment analysis of comment texts based on BiLSTM[J]. IEEE Access, 2019, 7: 51522- 51532. [24] DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[J]. arXiv:1810.04805, 2018. [25] LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324. [26] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[C]//Advances in Neural Information Processing Systems 25, 2012. [27] BADRINARAYANAN V, KENDALL A, CIPOLLA R. SegNet: a deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12): 2481-2495. [28] HE K, GKIOXARI G, DOLLáR P, et al. Mask R-CNN[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision, 2017: 2961-2969. [29] CHEN L C, PAPANDREOU G, KOKKINOS I, ET al. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40(4): 834-848. [30] SINGH A, LIU H, PLUMBLEY M D. E-PANNs: sound recognition using efficient pre-trained audio neural networks[J]. arXiv:2305.18665, 2023. [31] CHAN W, JAITLY N, LE Q, et al. Listen, attend and spell: a neural network for large vocabulary conversational speech recognition[C]//Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, 2016: 4960-4964. [32] ZHAO Q, GAO T, GUO N. TSVFN: two-stage visual fusion network for multimodal relation extraction[J]. Information Processing & Management, 2023, 60(3): 103264. [33] WU J, CUI Z, SHENG V S, et al. A comparative study of SIFT and its variants[J]. Measurement Science Review, 2013, 13(3): 122-131. [34] KADOTA R, SUGANO H, HIROMOTO M, et al. Hardware architecture for HOG feature extraction[C]//Proceedings of the 2009 5th International Conference on Intelligent Information Hiding and Multimedia Signal Processing, 2009: 1330-1333. [35] OYALLON E, RABIN J. An analysis of the SURF method[J]. Image Processing on Line, 2015, 5: 176-218. [36] RUBLEE E, RABAUD V, KONOLIGE K, et al. ORB: an efficient alternative to SIFT or SURF[C]//Proceedings of the 2011 International Conference on Computer Vision, 2011: 2564-2571. [37] WANG X, ZOU J, SHI D. An improved ORB image feature matching algorithm based on SURF[C]//Proceedings of the 2018 3rd International Conference on Robotics and Automation Engineering, 2018: 218-222. [38] ZHANG H Q, ZHANG J L, DAI R Y. A fast image matching research based on MIC-SURF algorithm[C]//Proceedings of the 27th Chinese Control and Decision Conference, 2015: 542-547. [39] ZHANG D, WEI S, LI S, et al. Multi-modal graph fusion for named entity recognition with targeted visual guidance[C]//Proceedings of the 35th AAAI Conference on Artificial Intelligence, 2021: 14347-14355. [40] ZHANG Q, FU J, LIU X, et al. Adaptive co-attention network for named entity recognition in tweets[C]//Proceedings of the 32nd AAAI Conference on Artificial Intelligence, 2018. [41] LU D, NEVES L, CARVALHO V, et al. Visual attention model for name tagging in multimodal social media[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2018: 1990-1999. [42] RIAZ F, ANWAR M W, MUQADES H. Maximum entropy based URDU named entity recognition[C]//Proceedings of the 2020 International Conference on Engineering and Emerging Technologies, 2020: 1-5. [43] PATIL N V, PATIL A S, PAWAR B V. HMM based named entity recognition for inflectional language[C]//Proceedings of the 2017 International Conference on Computer, Communications and Electronics, 2017: 565-572. [44] TANG B, FENG Y, WANG X, et al. A comparison of conditional random fields and structured support vector machines for chemical entity recognition in biomedical literature[J]. Journal of Cheminformatics, 2015, 7(1): 1-6. [45] LIU M, TU Z, ZHANG T, et al. LTP: a new active learning strategy for CRF-based named entity recognition[J]. Neural Processing Letters, 2022, 54(3): 2433-2454. [46] GU J, WANG Z, KUEN J, et al. Recent advances in convolutional neural networks[J]. Pattern Recognition, 2018, 77: 354-377. [47] XU X, XU J, RUAN G, et al. A pipeline-based multimodal military event argument extraction framework[C]//Proceedings of the 2022 China Conference on Knowledge Graph and Semantic Computing. Singapore: Springer, 2022: 21-29. [48] WANG X, YE J, LI Z, et al. CAT-MNER: multimodal named entity recognition with knowledge-refined cross-modal attention[C]//Proceedings of the 2022 IEEE International Conference on Multimedia and Expo, 2022: 1-6. [49] SUN L, WANG J, ZHANG K, et al. RpBERT: a text-image relation propagation-based BERT model for multimodal NER[C]//Proceedings of the 35th AAAI Conference on Artificial Intelligence, 2021: 13860-13868. [50] ZHANG X, YUAN J, LI L, et al. Reducing the bias of visual objects in multimodal named entity recognition[C]//Proceedings of the 16th ACM International Conference on Web Search and Data Mining, 2023: 958-966. [51] WANG P, WU J, CHEN X. Multimodal entity linking with gated hierarchical fusion and contrastive training[C]//Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2022: 938-948. [52] WANG J, YANG Y, MAO J, et al. CNN-RNN: a unified framework for multi-label image classification[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016: 2285-2294. [53] YOO J, KIM T, LEE S, et al. Enriched CNN-transformer feature aggregation networks for super-resolution[C]//Proceedings of the 2023 IEEE/CVF Winter Conference on Applications of Computer Vision, 2023: 4956-4965. [54] HAORAN X, ZIYI W. Condition evaluation and fault diagnosis of power transformer based on GAN-CNN[J]. Journal of Electrotechnology, Electrical Engineering and Management, 2023, 6(3): 8-16. [55] DHIRAVIDACHELVI E, PRABAVATHI R. Artificial humming bird optimization-based hybrid CNN-RNN for accurate exudate classification from fundus images[J]. Journal of Digital Imaging, 2023, 36(1): 59. [56] CHEN B, ZOU X, ZHANG Y, et al. LEFormer: a hybrid CNN-Transformer architecture for accurate lake extraction from remote sensing imagery[C]//Proceedings of the 2024 IEEE International Conference on Acoustics, Speech and Signal Processing, 2024: 5710-5714. [57] PARSEH M J, RAHMANIMANESH M, KESHAVARZI P, et al. Scene representation using a new two-branch neural network model[J]. The Visual Computer, 2024, 40: 6219-6244. [58] SUBRAHMANYESWARA R B. Accurate Leukocoria predictor based on deep VGG‐net CNN technique[J]. IET Image Processing, 2020, 14(10): 2241-2248. [59] ANAND R, SHANTHI T, NITHISH M S, et al. Face recognition and classification using GoogleNET architecture[C]//Soft Computing for Problem Solving, Vellore, Dec 17-19, 2018. Singapore: Springer, 2020: 261-269. [60] ZHANG K, GUO Y, WANG X, et al. Multiple feature reweight DenseNet for image classification[J]. IEEE Access, 2019, 7: 9872-9880. [61] HOWARD A G, ZHU M, CHEN B, et al. MobileNets: efficient convolutional neural networks for mobile vision applications[J]. arXiv:1704.04861, 2017. [62] TAN M, LE Q. EfficientNet: rethinking model scaling for convolutional neural networks[C]//Proceedings of the 2019 International Conference on Machine Learning, 2019: 6105-6114. [63] WU S, FEI H, CAO Y, et al. Information screening whilst exploiting! Multimodal relation extraction with feature denoising and multimodal topic modeling[J]. arXiv:2305. 11719, 2023. [64] LU Y, YANG R, JIANG X, et al. MRE: a military relation extraction model based on BiGRU and multi-head attention[J]. Symmetry, 2021, 13(9): 1742. [65] YU J, LI Z, WANG J, et al. Grounded multimodal named entity recognition on social media[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023: 9141-9154. [66] LEE C E, BAEK J, SON J, et al. Deep AI military staff: cooperative battlefield situation awareness for commander’s decision making[J]. The Journal of Supercomputing, 2023, 79(6): 6040-6069. [67] 余晓晗, 毛绍臣, 綦秀利, 等. 基于改进MAE的装甲车辆目标前景遮挡部分补全方法[J]. 火力与指挥控制, 2023, 48(11): 72-80. YU X H, MAO S C, QI X L, et al. Foreground occlusion complementation method for armored vehicle targets based on improved masked auto encoder (MAE)[J]. Fire Control & Command Control, 2023, 48(11): 72-80. [68] PARCHAMI M, ZHU W P, CHAMPAGNE B, et al. Recent developments in speech enhancement in the short-time Fourier transform domain[J]. IEEE Circuits and Systems Magazine, 2016, 16(3): 45-77. [69] SRIVASTAVA R K, PANDEY D. Speech recognition using HMM and soft computing[J]. Materials Today: Proceedings, 2022, 51: 1878-1883. [70] GUPTA H, GUPTA D. LPC and LPCC method of feature extraction in speech recognition system[C]//Proceedings of the 2016 6th International Conference on Cloud System and Big Data Engineering (Confluence), 2016: 498-502. [71] HIDAYAT R. Frequency domain analysis of MFCC feature extraction in children’s speech recognition system[J]. Jurnal Infotel, 2022, 14(1): 30-36. [72] NUGROHO H, FUADIYAH R N N. Development of speech emotion recognition system based on discrete wavelet transform (DWT) and voice segmentation[J]. International Journal on Electrical Engineering and Informatics, 2022, 14(3): 593-607. [73] LABIED M, BELANGOUR A. Automatic speech recognition features extraction techniques: a multi-criteria comparison[J]. International Journal of Advanced Computer Science and Applications, 2021, 12(8). [74] PRABHAVALKAR R, HORI T, SAINATH T N, et al. End-to-end speech recognition: a survey[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2024, 32: 325-351. [75] WU T, WANG G, ZHAO J, et al. Towards relation extraction from speech[J]. arXiv:2210.08759, 2022. [76] GHANNAY S, CAUBRIERE A, ESTEVE Y, et al. End-to-end named entity extraction from speech[J]. arXiv:1805.12045, 2018. [77] CHEN B, XU G, WANG X, et al. Aishell-NER: named entity recognition from Chinese speech[C]//Proceedings of the 2022 IEEE International Conference on Acoustics, Speech and Signal Processing, 2022: 8352-8356. [78] LI J, LI H, SUN D, et al. LLMs as bridges: reformulating grounded multimodal named entity recognition[J]. arXiv:2402.09989, 2024. [79] CHEN F, FENG Y. Chain-of-thought prompt distillation for multimodal named entity and multimodal relation extraction[J]. arXiv:2306.14122, 2023. [80] CHEN G, SHEN L, SHAO R, et al. LION: empowering multimodal large language model with dual-level visual knowledge[C]//Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024: 26540-26550. [81] WANG X, ZHOU W, ZU C, et al. InstructUIE: multi-task instruction tuning for unified information extraction[J]. arXiv:2304.08085, 2023. [82] HE W, MA H, LI S, et al. Using augmented small multimodal models to guide large language models for multimodal relation extraction[J]. Applied Sciences, 2023, 13(22): 12208. [83] 王震宇, 朱学芳, 杨睿. 基于多模态大语言模型的关系抽取研究[J/OL]. 数据分析与知识发现 [2024-05-11]. http://kns.cnki.net/kcms/detail/10.1478.G2.20240117.1110.024.html. WANG Z Y, ZHU X F, YANG R. Research on relation extraction based on multimodal large language model[J/OL]. Data Analysis and Knowledge Discovery [2024-05-11]. http://kns.cnki.net/kcms/detail/10.1478.G2.20240117.1110.024.html. [84] CUI H, FANG X, ZHANG Z, et al. Open visual knowledge extraction via relation-oriented multimodality model prompting[C]//Advances in Neural Information Processing Systems 36, 2024. [85] CHENG Z D, JU C, CHEN X, et al. Image to multi-modal retrieval for industrial scenarios[J]. arXiv:2305.03972, 2023. [86] GULZAR Y, ALWAN A A, ABDULLAH R M, et al. OCA: ordered clustering-based algorithm for e-commerce recommendation system[J]. Sustainability, 2023, 15(4): 2947. [87] ZHOU C, MA J, ZHANG J, et al. Contrastive learning for debiased candidate generation in large-scale recommender systems[C]//Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2021: 3985-3995. [88] SUI X, ZHANG Y, SONG K, et al. Improving zero-shot entity linking candidate generation with ultra-fine entity type information[C]//Proceedings of the 29th International Conference on Computational Linguistics, 2022: 2429-2437. [89] LUO P, XU T, WU S, et al. Multi-grained multimodal interaction network for entity linking[C]//Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2023: 1583-1594. [90] YANG C, HE B, WU Y, et al. MMEL: a joint learning framework for multi-mention entity linking[C]//Proceedings of Machine Learning Research 216: Uncertainty in Artificial Intelligence, Pittsburgh, 2023: 2411-2421. [91] GAN J, LUO J, WANG H, et al. Multimodal entity linking: a new dataset and a baseline[C]//Proceedings of the 29th ACM International Conference on Multimedia, 2021: 993-1001. [92] HUANG H, HECK L, JI H. Leveraging deep neural networks and knowledge graphs for entity disambiguation[J]. arXiv:1504.07678, 2015. [93] MOON S, NEVES L, CARVALHO V. Multimodal named entity disambiguation for noisy social media posts[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2018: 2000-2008. [94] ZHOU P, YING K, WANG Z, et al. Self-supervised enhancement for named entity disambiguation via multimodal graph convolution[J]. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35(1): 231-245. [95] ZHANG D, HUANG L. Multimodal knowledge learning for named entity disambiguation[C]//Findings of the Association for Computational Linguistics: EMNLP 2022, Abu Dhabi, 2022: 3160-3169. [96] YAO B M, CHEN Y, WANG Q, et al. AMELI: enhancing multimodal entity linking with fine-grained attributes[J]. arXiv:2305.14725, 2023. [97] ZHAO H, WANG X, CHEN S, et al. OVEL: large language model as memory manager for online video entity linking[J]. arXiv:2403.01411, 2024. [98] YANG L, CHEN H, WANG X, et al. Two heads are better than one: integrating knowledge from knowledge graphs and large language models for entity alignment[J]. arXiv:2401. 16960, 2024. [99] SHI S, XU Z, HU B, et al. Generative multimodal entity linking[J]. arXiv:2306.12725, 2023. [100] ROSSETTO L, KYRIAKOU A, LANGE S, et al. LifeGraph 4-lifelog retrieval using multimodal knowledge graphs and vision-language models[C]//Proceedings of the 7th Annual ACM Workshop on the Lifelog Search Challenge, 2024: 88-92. [101] LONG X, ZENG J, MENG F, et al. Generative multi-modal knowledge retrieval with large language models[C]//Proceedings of the 38th AAAI Conference on Artificial Intelligence, 2024: 18733-18741. [102] XIAO Z, GONG M, CASCANTE-BONILLA P, et al. Grounding language models for visual entity recognition[J]. arXiv:2402.18695, 2024. [103] SATO M, KOBAYASHI T, SOROIDA Y, et al. Development of novel deep multimodal representation learning‐based model for the differentiation of liver tumors on B‐mode ultrasound images[J]. Journal of Gastroenterology and Hepatology, 2022, 37(4): 678-684. [104] HUANG Z, XU X, NI J, et al. Multimodal representation learning for recommendation in Internet of things[J]. IEEE Internet of Things Journal, 2019, 6(6): 10675-10685. [105] YU L, CHEN J, SINHA A, et al. CommerceMM: large-scale commerce multimodal representation learning with omni retrieval[C]//Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2022: 4433-4442. [106] HU P, ZHEN L, PENG D, et al. Scalable deep multimodal learning for cross-modal retrieval[C]//Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2019: 635-644. [107] QI J, PENG Y. Cross-modal bidirectional translation via reinforcement learning[C]//Proceedings of the 27th International Joint Conference on Artificial Intelligence, 2018: 2630-2636. [108] ZHOU F, CHEN H. Cross-modal translation and alignment for survival analysis[C]//Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision, 2023: 21485-21494. [109] YE R, WANG M, LI L. End-to-end speech translation via cross-modal progressive training[J]. arXiv:2104.10380, 2021. [110] SHAO H, QIAN S, XIAO H, et al. Visual CoT: unleashing chain-of-thought reasoning in multi-modal language models[J]. arXiv:2403.16999, 2024. [111] LI X, LIAN D, LU Z, et al. GraphAdapter: tuning vision- language models with dual knowledge graph[C]//Advances in Neural Information Processing Systems 36, 2024. [112] MONDAL D, MODI S, PANDA S, et al. KAM-CoT: knowledge augmented multimodal chain-of-thoughts reasoning[C]//Proceedings of the 38th AAAI Conference on Artificial Intelligence, 2024: 18798-18806. [113] CHAWLA R, DATTA A, VERMA T, et al. Veagle: advancements in multimodal representation learning[J]. arXiv:2403. 08773, 2024. [114] LI W, FAN H, WONG Y, et al. Improving context understanding in multimodal large language models via multimodal composition learning[C]//Proceedings of the 41st International Conference on Machine Learning, 2024. [115] TAI Y, FAN W, ZHANG Z, et al. Link-context learning for multimodal LLMs[C]//Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024: 27176-27185. [116] WEI J, WANG X, SCHUURMANS D, et al. Chain-of-thought prompting elicits reasoning in large language models[C]//Advances in Neural Information Processing Systems 35, 2022: 24824-24837. [117] 蔡群, 付美玲, 黎文, 等. 电子目标知识图谱构建及运用[C]//中国指挥与控制学会. 第十一届中国指挥控制大会论文集, 2023: 6. CAI Q, FU M L, LI W, et al. Construction and application of electronic target knowledge graph[C]//Chinese Institute of Command and Control. Proceedings of the 11th China Conference on Command and Control, 2023: 6. [118] 顾丹阳, 李明倩, 权冀川, 等. 基于本体的主战武器装备知识图谱构建[J]. 指挥控制与仿真, 2021, 43(6): 14-20. GU D Y, LI M Q, QUAN J C, et al. Ontology based knowledge graph construction for combat weapon equipment[J]. Command Control & Simulation, 2021, 43(6): 14-20. [119] 黄伟春, 肖刚, 杨健, 等. 基于本体的军事术语知识图谱构建方法[J]. 指挥控制与仿真, 2023, 45(5): 10-17. HUANG W C, XIAO G, YANG J, et al. Ontology-based military terminology knowledge graph construction method[J]. Command Control & Simulation, 2023, 45(5): 10-17. [120] 邢萌, 杨朝红, 毕建权. 军事领域知识图谱的构建及应用[J]. 指挥控制与仿真, 2020, 42(4): 1-7. XING M, YANG C H, BI J Q. Construction and application of domain-specific knowledge graph in military field[J]. Command Control & Simulation, 2019, 42(4): 1-7. [121] 袁清波, 杜晓明, 姚奕, 等. 融合汉字多特征的指挥控制保障领域命名实体识别[J]. 火力与指挥控制, 2022, 47(9): 48-53. YUAN Q B, DU X M, YAO Y, et al. Named entity recognition in command and control support domain based on multi feature of Chinese characters[J]. Fire Control & Command Control, 2022, 47(9): 48-53. [122] WANG Y, WANG T, WANG J, et al. Military chain: construction of domain knowledge graph of kill chain based on natural language model[J]. Mobile Information Systems, 2022. [123] 傅浩, 刘姗姗, 李佳蔚, 等. 军事事件主题检测与抽取方法[J]. 指挥信息系统与技术, 2024, 15(1): 76-81. FU H, LIU S S, LI J W, et al. Military event theme detection and extraction method[J]. Command Information System and Technology, 2024, 15(1): 76-81. [124] 成浩, 梁平, 刘超鑫, 等. 情报信息驱动下的军事目标知识深度认知研究[J]. 网络安全与数据治理, 2023, 42(S2): 139-143. CHENG H, LIANG P, LIU C X, et al. Research on the construction of military target knowledge graph based on intelligence information driven[J]. Cyber Security and Data Governance, 2023, 42(S2): 139-143. [125] 贺玲, 贺照辉. 大数据技术在战场态势感知中的应用[J]. 科技与创新, 2023(7): 178-181. HE L, HE Z H. Application of big data technology in battlefield situational awareness[J]. Science and Technology & Innovation, 2023(7): 178-181. [126] 王昊奋, 易侃, 吴蔚, 等. 多模态态势感知的知识表示、表示学习和知识推理[J]. 指挥信息系统与技术, 2022, 13(3): 1-11. WANG H F, YI K, WU W, et al. Knowledge representation, representation learning and knowledge reasoning for multi?modal situational awareness[J]. Command Information System and Technology, 2022, 13(3): 1-11. [127] 黄梓航, 蒋秉川, 刘靖旭. 战场环境知识图谱智能服务系统设计和关键技术研究[J]. 军事运筹与系统工程, 2021, 35(4): 73-80. HUANG Z H, JIANG B C, LIU J X. Battlefield environment knowledge map intelligent service system design and key technology research[J]. Military Operations Research and Systems Engineering, 2021, 35(4): 73-80. [128] 时空知识图谱应用初探[EB/OL]. (2021-12-21) [2024-03-10]. https://blog.csdn.net/bmy0000/article/details/122055190. Application of spatiotemporal knowledge graph[EB/OL]. (2021-12-21) [2024-03-10]. https://blog.csdn.net/bmy0000/article/details/122055190. [129] 彭京徽, 汪振, 李越, 等. 装备领域多模态知识图谱技术研究[J]. 兵器装备工程学报, 2022, 43(11): 136-140. PENG J H, WANG Z, LI Y, et al. Research on multimodal knowledge graph technology in equipment field[J]. Journal of Ordnance Equipment Engineering, 2022, 43(11): 136-140. [130] 胡卫, 赵文龙, 李石磊, 等. 军事装备管理数据知识图谱构建及应用[J]. 火力与指挥控制, 2022, 47(10): 125-131. HU W, ZHAO W L, LI S L, et al. Research on the construction and application of knowledge graph of military equipment management data[J]. Fire Control & Command Control, 2022, 47(10): 125-131. [131] 王宏宇, 许潇, 周育伟, 等. 基于军事领域知识图谱的智能问答系统设计与实现[J]. 装甲兵学报, 2022, 1(2): 87-94. WANG H Y, XU X, ZHOU Y W, et al. Design and implementation of intelligent question-and-answer system based on knowledge graph in military field[J]. Journal of Armored Forces, 2022, 1(2): 87-94. [132] 宗滕, 吴松涛, 周春华. 基于多模态数据分析的典型智能化军事应用[J]. 信息安全与通信保密, 2022(2): 9-16. ZONG T, WU S T, ZHOU C H. Typical intelligent military application based on multimodal data analysis[J]. Information Security and Communications Privacy, 2022(2): 9-16. [133] 李卫星, 王峰, 李智国, 等. 面向多源数据的军事信息系统设计[J]. 中国电子科学研究院学报, 2020, 15(3): 237-243. LI W X, WANG F, LI Z G, et al. Design of military information system based on multi-source data[J]. Journal of China Academy of Electronics and Information Technology, 2020, 15(3): 237-243. [134] PAN S, LUO L, WANG Y, et al. Unifying large language models and knowledge graphs: a roadmap[J]. arXiv:2306. 08302, 2023. |
[1] | 张钦彤, 王昱超, 王鹤羲, 王俊鑫, 陈海. 大语言模型微调技术的研究综述[J]. 计算机工程与应用, 2024, 60(17): 17-33. |
[2] | 高帅, 奚雪峰, 郑倩, 崔志明, 盛胜利. 面向数据可视化的自然语言接口研究综述[J]. 计算机工程与应用, 2024, 60(15): 24-41. |
[3] | 于丰瑞. 网络威胁技战术情报自动化识别提取研究综述[J]. 计算机工程与应用, 2024, 60(13): 1-22. |
[4] | 陈囿任, 李勇, 温明, 孙驰. 多模态知识图谱融合技术研究综述[J]. 计算机工程与应用, 2024, 60(13): 36-50. |
[5] | 赵宇博, 张丽萍, 闫盛, 侯敏, 高茂. 个性化学习中学科知识图谱构建与应用综述[J]. 计算机工程与应用, 2023, 59(10): 1-21. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||