Research on Multi-Modal Video Paragraph Captioning Based on Dual-Transformer Structure
ZHAO Hong, ZHANG Lijun
Computer Engineering and Applications . 2025, (21): 182 -191 .  DOI: 10.3778/j.issn.1002-8331.2407-0330