[1] 崔磊,徐毅恒,吕腾超,等.文档智能:数据集、模型和应用[J].中文信息学报, 2022, 36(6): 1-19.
CUI L, XU Y H, LV T, C et al. Document AI: benchmarks, models and applications [J].Journal of Chinese Information Processing, 2022, 36 (6): 1-19.
[2] APPALARAJU S, JASANI B, BHARGAVA U K, et al. DocFormer: end-to-end Transformer for document understanding[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021.
[3] KIM G, HONG T, YIM M, et al. OCR-free document understanding transformer[C]//Proceedings of the 17th European Conference on Computer Vision, 2022: 498-517.
[4] MISTRY J,ARZENO N M. Document understanding for healthcare referrals[C]//Proceedings of the 11th IEEE International Conference on Healthcare Informatics (ICHI), 2023.
[5] ?IMSA S,??ULC M,?URICAR M, et al. DocILE benchmark for document information localization and extraction[C]// Proceedings of the 17th International Conference on Document Analysis and Recognition(ICDAR 2023), San José, CA, USA, August 21-26, 2023.
[6] SVEN N M, MATTEO R. Page layout analysis of text-heavy historical documents: a comparison of textual and visual approaches[C]//Proceedings of the Computational Humanities Research Conference, 2022.
[7] DEVLIN J, CHANG M W,LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 2019.
[8] XU Y H, LI M H,CUI L,et al.LayoutLM: Pre-training of text and layout for document image understanding[C]//Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD’20), 2020.
[9] XU Y, XU Y H, LV T C, et al. LayoutLMv2: multi-modal pre-training for visually-rich document understanding[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2021 :2579-2591.
[10] HUANG Y P,LV T C,CUI L,et al.LayoutLMv3: pre-training for document ai with unified text and image masking[C]//Proceedings of the 30th ACM International Conference on Multimedia, 2022.
[11] LIU X,ZHENG Y N,DU Z X, et al.GPT understands,too[J].arXiv:2103.10385, 2021.
[12] GU Z X, MENG C H, WANG K, et al. XYLayoutLM: towards layout-aware multimodal networks for visually-rich document understanding[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
[13] LUO C W, CHENG C X, ZHENG Q, et al. GeoLayoutLM: geometric pre-training for visual information extraction[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
[14] XU Y H,LV T C,CUI L,et al. LayoutXLM: multimodal pre-training for multilingual visually-rich document understanding[C]//Proceedings of the Annual Meeting of the Association for Computational Linguistics; International Joint Conference on Natural Language Processing(ACL2021), 2021.
[15] REYNOLDS L, MCDONELL K. Prompt programming for large language models: beyond the few-shot paradigm[C]//Proceedings of the Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems (CHI EA’21) , 2021.
[16] LI X L, LIANG P. Prefix-tuning: optimizing continuous prompts for generation[C]//Proceedings of the Annual Meeting of the Association for Computational Linguistics;International Joint Conference on Natural Language Processing(ACL2021), 2021.
[17] LESTER B, AI-RFOU R, CONSTANT N. The power of scale for parameter-efficient prompt tuning[C]//Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021.
[18] WANG L, HE J B , XU X ,et al. Alignment- enriched tuning for patch-level pre-trained document image models[C]//Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence (AAAI’23/IAAI’23/EAAI’23), 2023.
[19] XU L, JIE Z M, LU W, et al. Better feature integration for named entity recognition[C]//Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies( NAACL), 2021.
[20] KUMAR R, GOYAL S, VERMA A, et al. ProtoNER: few shot incremental learning for named entity recognition using prototypical networks[C]//Proceedings of the International Conference on Business Process Management (BPM), 2023.
[21] TOM B,?BENJAMIN M,?NICK R,et al. Language models are few-shot learners[C]//Proceedings of the 34th International Conference on Neural Information Processing Systems (NIPS’20), 2020: 1877-1901.
[22] SILAJEV I,VICTOR N,MORTIMER P. Semantic table detection with LayoutLMv3[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022. |