Computer Engineering and Applications ›› 2022, Vol. 58 ›› Issue (17): 50-60.DOI: 10.3778/j.issn.1002-8331.2203-0243
• Research Hotspots and Reviews • Previous Articles Next Articles
WANG Ruiping, WU Shihong, ZHANG Meihang, WANG Xiaoping
Online:
2022-09-01
Published:
2022-09-01
王瑞平,吴士泓,张美航,王小平
WANG Ruiping, WU Shihong, ZHANG Meihang, WANG Xiaoping. Review of Language Processing Methods for Visual Question Answering[J]. Computer Engineering and Applications, 2022, 58(17): 50-60.
王瑞平, 吴士泓, 张美航, 王小平. 视觉问答语言处理方法综述[J]. 计算机工程与应用, 2022, 58(17): 50-60.
Add to citation manager EndNote|Ris|BibTeX
URL: http://cea.ceaj.org/EN/10.3778/j.issn.1002-8331.2203-0243
[1] ZHANG D,CAO R,WU S.Information fusion in visual question answering:a survey[J].Information Fusion,2019,52:268-280. [2] HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780. [3] CHO K,VAN MERRIENBOER B,GULCEHRE C,et al.Learning phrase representations using RNN encoder-decoder for statistical machine translation[J].arXiv:1406. 1078,2014. [4] MANMADHAN S,KOVOOR B C.Visual question answering:a state-of-the-art review[J].Artificial Intelligence Review,2020,53(8):5705-5745. [5] ZHANG W,YU J,ZHAO W,et al.DMRFNet:deep multimodal reasoning and fusion for visual question answering and explanation generation[J].Information Fusion,2021,72:70-79. [6] UROOJ A,KUEHNE H,DUARTE K,et al.Found a reason for me? weakly-supervised grounded visual question answering using capsules[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2021:8465-8474. [7] SHARMA H,JALAL A S.Visual question answering model based on graph neural network and contextual attention[J].Image and Vision Computing,2021:104165. [8] RAHMAN T,CHOU S H,SIGAL L,et al.An improved attention for visual question answering[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2021:1653-1662. [9] PENNINGTON J,SOCHER R,MANNING C D.Glove:global vectors for word representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing(EMNLP),2014:1532-1543. [10] WHITEHEAD S,WU H,JI H,et al.Separating skills and concepts for novel visual question answering[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2021:5632-5641. [11] DEVLIN J,CHANG M W,LEE K,et al.BERT:pre-training of deep bidirectional transformers for language understanding[J].arXiv:1810.04805,2018. [12] 李舟军,范宇,吴贤杰.面向自然语言处理的预训练技术研究综述[J].计算机科学,2020,47(3):162-173. LI Z J,FAN Y,WU X J.Survey of natural language processing pre-training techniques[J].Computer Science,2020,47(3):162-173. [13] MIKOLOV T,SUTSKEVER I,CHEN K,et al.Distributed representations of words and phrases and their compositionality[C]//Advances in Neural Information Processing Systems,2013:3111-3119. [14] MIKOLOV T,CHEN K,CORRADO G,et al.Efficient estimation of word representations in vector space[J].arXiv:1301.3781,2013. [15] PETERS M,NEUMANN M,IYYER M,et al.Deep contextualized word representations[J].arXiv:1802.05365,2018. [16] RADFORD A,NARASIMHAN K,SALIMANS T,et al.Improving language understanding by generative pre-training[EB/OL].[2022-01-20].https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language understanding paper.pdf. [17] SUN Y,WANG S,LI Y,et al.Ernie:enhanced representation through knowledge integration[J].arXiv:1904.09223,2019. [18] ZHANG Z,HAN X,LIU Z,et al.ERNIE:enhanced language representation with informative entities[J].arXiv:1905.07129,2019. [19] YANG Z,DAI Z,YANG Y,et al.Xlnet:generalized autoregressive pretraining for language understanding[C]//Advances in Neural Information Processing Systems,2019. [20] 陈德光,马金林,马自萍,等.自然语言处理预训练技术综述[J].计算机科学与探索,2021,15(8):1359-1389. CHEN D G,MA J L,MA Z P,et al.Review of pre-training techniques for natural language processing[J].Journal of Frontiers of Computer Science and Technology,2021,15(8):1359-1389. [21] OTTER D W,MEDINA J R,KALITA J K.A survey of the usages of deep learning for natural language processing[J].IEEE Transactions on Neural Networks and Learning Systems,2020,32(2):604-624. [22] XU H,SAENKO K.Ask,attend and answer:exploring question-guided spatial attention for visual question answering[C]//European Conference on Computer Vision,2016:451-466. [23] WU Q,SHEN C,WANG P,et al.Image captioning and visual question answering based on attributes and external knowledge[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,40(6):1367-1381. [24] YU D,FU J,MEI T,et al.Multi-level attention networks for visual question answering[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2017:4709-4717. [25] YU Z,YU J,FAN J,et al.Multi-modal factorized bilinear pooling with co-attention learning for visual question answering[C]//Proceedings of the IEEE International Conference on Computer Vision,2017:1821-1830. [26] BEN-YOUNES H,CADENE R,CORD M,et al.Mutan:multimodal tucker fusion for visual question answering[C]//Proceedings of the IEEE International Conference on Computer Vision,2017:2612-2620. [27] MALINOWSKI M,ROHRBACH M,FRITZ M.Ask your neurons:a deep learning approach to visual question answering[J].International Journal of Computer Vision,2017,125(1):110-135. [28] TENEY D,LIU L,VAN DEN HENGEL A.Graph-structured representations for visual question answering[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2017:1-9. [29] JANG Y,SONG Y,YU Y,et al.TGIF-QA:toward spatio-temporal reasoning in visual question answering[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2017:2758-2766. [30] ANDERSON P,HE X,BUEHLER C,et al.Bottom-up and top-down attention for image captioning and visual question answering[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2018:6077-6086. [31] MA C,SHEN C,DICK A,et al.Visual question answering with memory-augmented networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2018:6975-6984. [32] QIAO T,DONG J,XU D.Exploring human-like attention supervision in visual question answering[C]//Proceedings of the AAAI Conference on Artificial Intelligence,2018. [33] SONG J,ZENG P,GAO L,et al.From pixels to objects:cubic visual attention for visual question answering[C]//Proceedings of IJCAI,2018:906-912. [34] SU Z,ZHU C,DONG Y,et al.Learning visual knowledge memory networks for visual question answering[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2018:7736-7745. [35] SHI Y,FURLANELLO T,ZHA S,et al.Question type guided attention in visual question answering[C]//Proceedings of the European Conference on Computer Vision(ECCV),2018:151-166. [36] BAI Y,FU J,ZHAO T,et al.Deep attention neural tensor network for visual question answering[C]//Proceedings of the European Conference on Computer Vision(ECCV),2018:20-35. [37] LIANG J,JIANG L,CAO L,et al.Focal visual-text attention for visual question answering[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2018:6135-6143. [38] NARASIMHAN M,LAZEBNIK S,SCHWING A.Out of the box:reasoning with graph convolution nets for factual visual question answering[J].arXiv:1811.00538,2018. [39] NARASIMHAN M,SCHWING A G.Straight to the facts:learning knowledge base retrieval for factual visual question answering[C]//Proceedings of the European Conference on Computer Vision(ECCV),2018:451-468. [40] NGUYEN D K,OKATANI T.Improved fusion of visual and language representations by dense symmetric co-attention for visual question answering[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2018:6087-6096. [41] TENEY D,VAN DEN HENGEL A.Visual question answering as a meta learning task[C]//Proceedings of the European Conference on Computer Vision(ECCV),2018:219-235. [42] GAO P,LI H,LI S,et al.Question-guided hybrid convolution for visual question answering[C]//Proceedings of the European Conference on Computer Vision(ECCV),2018:469-485. [43] LU P,LI H,ZHANG W,et al.Co-attending free-form regions and detections with multi-modal multiplicative feature embedding for visual question answering[C]//Proceedings of the AAAI Conference on Artificial Intelligence,2018. [44] WU C,LIU J,WANG X,et al.Chain of reasoning for visual question answering[C]//Advances in Neural Information Processing Systems,2018:275-285. [45] WU C,LIU J,WANG X,et al.Object-difference attention:a simple relational attention for visual question answering[C]//Proceedings of the 26th ACM International Conference on Multimedia,2018:519-527. [46] DO T,DO T T,TRAN H,et al.Compact trilinear interaction for visual question answering[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision,2019:392-401. [47] GAO L,ZENG P,SONG J,et al.Structured two-stream attention network for video question answering[C]//Proceedings of the AAAI Conference on Artificial Intelligence,2019:6391-6398. [48] GAO P,JIANG Z,YOU H,et al.Dynamic fusion with intra-and inter-modality attention flow for visual question answering[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2019:6639-6648. [49] JHA S,DEY A,KUMAR R,et al.A novel approach on visual question answering by parameter prediction using faster region based convolutional neural network[J].International Journal of Interactive Multimedia and Artificial Intelligence,2019,5(5):30-37. [50] LI L,GAN Z,CHENG Y,et al.Relation-aware graph attention network for visual question answering[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision,2019:10313-10322. [51] LIU F,LIU J,FANG Z,et al.Densely connected attention flow for visual question answering[C]//Proceedings of IJCAI,2019:869-875. [52] OSMAN A,SAMEK W.DRAU:dual recurrent attention units for visual question answering[J].Computer Vision and Image Understanding,2019,185:24-30. [53] SHRESTHA R,KAFLE K,KANAN C.Answer them all! toward universal visual question answering models[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2019:10472-10481. [54] YU Z,YU J,CUI Y,et al.Deep modular co-attention networks for visual question answering[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2019:6281-6290. [55] HONG J,FU J,UH Y,et al.Exploiting hierarchical visual features for visual question answering[J].Neurocomputing,2019,351:187-195. [56] WU C,LIU J,WANG X,et al.Differential networks for visual question answering[C]//Proceedings of the AAAI Conference on Artificial Intelligence,2019:8997-9004. [57] XI Y,ZHANG Y,DING S,et al.Visual question answering model based on visual relationship detection[J].Signal Processing:Image Communication,2020,80:115648. [58] DO T,NGUYEN B X,TRAN H,et al.Multiple interaction learning with question-type prior knowledge for constraining answer search space in visual question answering[C]//European Conference on Computer Vision,2020:496-510. [59] GAO D,WANG R,SHAN S,et al.Learning to recognize visual concepts for visual question answering with structural label space[J].IEEE Journal of Selected Topics in Signal Processing,2020,14(3):494-505. [60] HONG J,PARK S,BYUN H.Selective residual learning for visual question answering[J].Neurocomputing,2020,402:366-374. [61] LEI C,WU L,LIU D,et al.Multi-question learning for visual question answering[C]//Proceedings of the AAAI Conference on Artificial Intelligence,2020:11328-11335. [62] YU J,ZHU Z,WANG Y,et al.Cross-modal knowledge reasoning for knowledge-based visual question answering[J].Pattern Recognition,2020,108:107563. [63] ZHANG L,LIU S,LIU D,et al.Rich visual knowledge-based augmentation network for visual question answering[J].IEEE Transactions on Neural Networks and Learning Systems,2021,32(10):4362-4373. [64] ZHANG W,YU J,HU H,et al.Multimodal feature fusion by relational reasoning and attention for visual question answering[J].Information Fusion,2020,55:116-126. [65] LIU Y,ZHANG X,HUANG F,et al.Adversarial learning with multi-modal attention for visual question answering[J].IEEE Transactions on Neural Networks and Learning Systems,2021,32(9):3894-3908. [66] KIM J,LEE D,WU J,et al.Visual question answering based on local-scene-aware referring expression generation[J].Neural Networks,2021,139:158-167. [67] GUO W,ZHANG Y,YANG J,et al.Re-attention for visual question answering[J].IEEE Transactions on Image Processing,2021,30:6730-6743. [68] LAO M,GUO Y,PU N,et al.Multi-stage hybrid embedding fusion network for visual question answering[J].Neurocomputing,2021,423:541-550. [69] LI H,HAN D.Multimodal encoders and decoders with gate attention for visual question answering[J].Computer Science and Information Systems,2021:32. [70] WU Y,MA Y,WAN S.Multi-scale relation reasoning for multi-modal visual question answering[J].Signal Processing:Image Communication,2021,96:116319. [71] ZHANG S,CHEN M,CHEN J,et al.Multimodal feature-wise co-attention method for visual question answering[J].Information Fusion,2021,73:1-10. [72] BAI Z,LI Y,WO?NIAK M,et al.DecomVQANet:decomposing visual question answering deep network via tensor decomposition and regression[J].Pattern Recognition,2021,110:107538. [73] YU J,ZHANG W,LU Y,et al.Reasoning on the relation:enhancing visual representation for visual question answering and cross-modal retrieval[J].IEEE Transactions on Multimedia,2020,22(12):3196-3209. [74] ZHU C,ZHAO Y,HUANG S,et al.Structured attentions for visual question answering[C]//Proceedings of the IEEE International Conference on Computer Vision,2017:1291-1300. [75] MALINOWSKI M,DOERSCH C,SANTORO A,et al.Learning visual question answering by bootstrapping hard attention[C]//Proceedings of the European Conference on Computer Vision(ECCV),2018:3-20. [76] PATRO B,NAMBOODIRI V P.Differential attention for visual question answering[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2018:7680-7688. [77] MANJUNATHA V,SAINI N,DAVIS L S.Explicit bias discovery in visual question answering models[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2019:9562-9571. [78] CADENE R,BEN YOUNES H,CORD M,et al.Murel:multimodal relational reasoning for visual question answering[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2019:1989-1998. [79] ZHOU Y,JI R,SU J,et al.Dynamic capsule attention for visual question answering[C]//Proceedings of the AAAI Conference on Artificial Intelligence,2019:9324-9331. [80] CAO Q,LIANG X,LI B,et al.Interpretable visual question answering by reasoning on dependency trees[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2021,43(3):887-901. [81] HOSSEINABAD S H,SAFAYANI M,MIRZAEI A.Multiple answers to a question:a new approach for visual question answering[J].The Visual Computer,2021,37(1):119-131. [82] FANG Z,LIU J,LI Y,et al.Improving visual question answering using dropout and enhanced question encoder[J].Pattern Recognition,2019,90:404-414. [83] GOKHALE T,BANERJEE P,BARAL C,et al.Vqa-lol:visual question answering under the lens of logic[C]//European Conference on Computer Vision,2020:379-396. [84] LIANG W,JIANG Y,LIU Z.GraghVQA:language-guided graph neural networks for graph-based visual question answering[J].arXiv:2104.10283,2021. [85] GAO P,YOU H,ZHANG Z,et al.Multi-modality latent interaction network for visual question answering[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision,2019:5825-5835. [86] VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]//Advances in Neural Information Processing Systems,2017:5998-6008. [87] LIU Y,ZHANG X,HUANG F,et al.Visual question answering via attention-based syntactic structure tree-LSTM[J].Applied Soft Computing,2019,82:105584. [88] ZHU Y,LIM J J,FEI-FEI L.Knowledge acquisition for visual question answering via iterative querying[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2017:1154-1163. [89] BOJANOWSKI P,GRAVE E,JOULIN A,et al.Enriching word vectors with subword information[J].Transactions of the Association for Computational Linguistics,2017,5:135-146. [90] SHIH K J,SINGH S,HOIEM D.Where to look:focus regions for visual question answering[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2016:4613-4621. [91] HU R,ANDREAS J,ROHRBACH M,et al.Learning to reason:end-to-end module networks for visual question answering[C]//Proceedings of the IEEE International Conference on Computer Vision,2017:804-813. [92] ADITYA S,YANG Y,BARAL C.Explicit reasoning over end-to-end neural architectures for visual question answering[C]//Proceedings of the AAAI Conference on Artificial Intelligence,2018. [93] SPEER R,CHIN J,HAVASI C.ConceptNet 5.5:an open multilingual graph of general knowledge[C]//Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence,2017:4444-4451. [94] GAO L,CAO L,XU X,et al.Question-led object attention for visual question answering[J].Neurocomputing,2020,391:227-233. [95] LIU Y,ZHANG X,ZHANG Q,et al.Dual self-attention with co-attention networks for visual question answering[J].Pattern Recognition,2021,117:107956. |
[1] | XU Yinxiang, CHEN Qidong, SUN Jun. Text Adversarial Attack Method Applying Based on Improved Quantum Behaved Particle Swarm Optimization [J]. Computer Engineering and Applications, 2022, 58(9): 175-180. |
[2] | CHEN Yidong, LU Zhonghua. Forecasting CPI Based on Convolutional Neural Network and Long Short-Term Memory Network [J]. Computer Engineering and Applications, 2022, 58(9): 256-262. |
[3] | WU Zhou, ZHANG Hongrui, ZHANG Haijun, SONG Qing. Summary of Research and Application of Neighborhood Field Optimization Algorithm [J]. Computer Engineering and Applications, 2022, 58(9): 1-8. |
[4] | LIU Guang, TU Gang, LI Zheng, LIU Yijian, ZHAN Zhiqiang. Research on Multi-Dimensional End-to-End Phrase Recognition Algorithm Based on Background Knowledge [J]. Computer Engineering and Applications, 2022, 58(8): 147-155. |
[5] | CAI Qiming, ZHANG Lei, XU Chenhao. Research of Process Similarity Based on Single-Layer Neural Network [J]. Computer Engineering and Applications, 2022, 58(7): 295-302. |
[6] | YANG Xi, YAN Jie, WANG Wen, LI Shaoyi, LIN Jian. Research and Prospect of Brain-Inspired Model for Visual Object Recognition [J]. Computer Engineering and Applications, 2022, 58(7): 1-20. |
[7] | MA Menghao, WANG Zhe. Semi-supervised Learning Method via Wasserstein Distance Under Small Sample Condition [J]. Computer Engineering and Applications, 2022, 58(5): 193-199. |
[8] | CHEN Zhili, GAO Hao, PAN Yixuan, XING Feng. Review of Computer Aided Diagnosis Technology in Mammography [J]. Computer Engineering and Applications, 2022, 58(4): 1-21. |
[9] | JU Sibo, XU Jing, LI Yanfang. Text-to-Single Image Method Based on Self-Attention [J]. Computer Engineering and Applications, 2022, 58(3): 249-258. |
[10] | WU Di, JIANG Liting, WANG Lulu, Tuergen Yibulayin, Aishan Wumaier, Zaokere Kadder. Research on Classification of Tourist Questions Combined with Multi-head Attention Mechanism [J]. Computer Engineering and Applications, 2022, 58(3): 165-171. |
[11] | ZHU Liangqi, HUANG Bo, HUANG Jitao, MA Liyuan, SHI Zhicai. Research on Short Text Clustering Based on BERT and AutoEncoder [J]. Computer Engineering and Applications, 2022, 58(2): 145-152. |
[12] | YUN Jingyang, LI Xuehua, XIANG Wei. Semantic-Guidance Multi-scale Network for Multi-view Stereo [J]. Computer Engineering and Applications, 2022, 58(2): 215-224. |
[13] | TANG Huanling, WANG Hui, WEI Hao, ZHAO Honglei, DOU Quansheng, LU Mingyu. BERT-LCRF Named Entity Recognition Method Oriented Clock Domain [J]. Computer Engineering and Applications, 2022, 58(18): 218-226. |
[14] | WANG Hui, QI Qianqian, LI Xue, SUN Weijia, LIU Ying, YAO Chunli. Research Progress in Automatic Classification of Skin Lesions Image [J]. Computer Engineering and Applications, 2022, 58(16): 31-48. |
[15] | SUN Baoshan, TAN Hao. Automatic Text Summarization Technology Based on ALBERT-UniLM Model [J]. Computer Engineering and Applications, 2022, 58(15): 184-190. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||