Most Download articles

    Published in last 1 year| In last 2 years| In last 3 years| All| Most Downloaded in Recent Month | Most Downloaded in Recent Year|

    Published in last 1 year
    Please wait a minute...
    For Selected: Toggle Thumbnails
    Research on Intelligent Question Answering System Based on Large Language Model
    REN Haiyu, LIU Jianping, WANG Jian, GU Xunxun, CHEN Xi, ZHANG Yue, ZHAO Changxu
    Computer Engineering and Applications    2025, 61 (7): 1-24.   DOI: 10.3778/j.issn.1002-8331.2409-0300
    Abstract706)      PDF(pc) (1720KB)(715)       Save
    Intelligent question answering is a core subfield in natural language processing, aiming at systems that understand and answer natural language questions posed by users. Traditional question answering systems usually rely on predefined rules and limited corpora and are unable to handle complex multi-round dialogues. Large language models are natural language processing models based on deep learning technology, with billions or even hundreds of billions of parameters. They can not only understand and generate natural language but also significantly improve the accuracy and efficiency of question answering systems, promoting the development of intelligent question answering technology. In recent years, intelligent question answering based on large model technology has gradually become a research hotspot, but a systematic review in this field is still relatively lacking. Therefore, this article conducts a systematic review of intelligent question answering systems based on large models. Firstly, it introduces the basic concepts of question answering systems, datasets, and their evaluation metrics. Secondly, it presents question answering systems based on large models, including those based on prompt learning, knowledge graphs, retrieval-augmented generation, and intelligent agents, as well as the technical route of fine-tuning in question answering tasks, and compares the advantages, disadvantages, and application scenarios of the five methods in question answering systems. Finally, it summarizes the current research challenges and future development trends of question answering systems based on large language models.
    Reference | Related Articles | Metrics
    Improved YOLOv11n Small Object Detection Algorithm in UAV View
    LI Bin, LI Shenglin
    Computer Engineering and Applications    2025, 61 (7): 96-104.   DOI: 10.3778/j.issn.1002-8331.2411-0072
    Abstract695)      PDF(pc) (1241KB)(634)       Save
    In order to effectively deal with the challenges of complex background, dense target, target miniaturization and mobile terminal deployment faced by small target detection in UAV aerial photography, the YOLOv11n model is improved. Firstly, RFCBAMConv module is used to improve C3k2, which enhances the ability of feature extraction. Then, the dilated feature pyramid convolution (DFPC) module is designed to replace the original SPPF layer. Through multi-scale dilated convolution, the extraction of small target detail features of UAV is strengthened. Secondly, a new feature pyramid structure is proposed, and a feature map output of 160×160 size is added to the P2 layer to extract the feature information of small targets. This method replaces the traditional practice of adding P2 small target detection head. The CSPOK module and ContextGuidedBlock_Down (CGBD) convolution are introduced, which significantly improves the extraction ability of global features and the fusion ability of multi-scale features. Finally, the dynamic detection head (DyHead) is used to replace the original detection head, which improves the target detection accuracy of the model. The experimental results show that the mAP@0.5 and mAP@0.5:0.95 indicators of the improved model on the VisDrone dataset are increased by 0.071 and 0.049, respectively. In addition, the generalization experiments on AI-TOD and SODA-A datasets also show that the improved model achieves 0.055 and 0.048 improvement in mAP@0.5, respectively, which fully verifies the effectiveness and universality of the model.
    Reference | Related Articles | Metrics
    Survey of Retrieval-Augmented Generation Based on Large Language Models
    LIU Xueying, YUN Jing, LI Bo, SHI Xiaoguo, ZHANG Yuying
    Computer Engineering and Applications    2025, 61 (13): 1-25.   DOI: 10.3778/j.issn.1002-8331.2410-0088
    Abstract490)      PDF(pc) (1412KB)(465)       Save
    Artificial intelligence agents provide efficient solutions in complex tasks, which have recently gained attention in industry. As one of the paradigms of artificial intelligence agents, retrieval-augmented generation (RAG), which aims to enhance the quality of generated responses by combining information retrieval and content generation techniques, has gradually become the focus of research. According to the studies on retrieval enhancement generation methods at home and abroad, the basic concept and workflow of RAG are elaborated, the current state of the technology is summarized, the advantages and disadvantages of the existing RAG technology are analyzed, and the existing evaluation indexes, datasets and benchmarks are sorted out. Finally, challenges faced by RAG technology in future application scenarios are discussed and the future development direction of RAG technology is envisioned.
    Reference | Related Articles | Metrics
    Review of Deep Learning Models for Image Classification Based on Convolutional Neural Networks
    LIU Hongda, SUN Xuhui, LI Yibin, HAN Lin, ZHANG Yu
    Computer Engineering and Applications    2025, 61 (11): 1-21.   DOI: 10.3778/j.issn.1002-8331.2411-0196
    Abstract467)      PDF(pc) (1675KB)(437)       Save
    Using neural network model for classification has always been a very important research direction. With the development of deep learning technology, the requirement for neural network model is getting higher and higher. At the same time, high recognition rate, the number of parameters and training time of the model are also highly required. Convolutional neural networks have always been the mainstream method for image classification in deep learning. This paper mainly introduces the development history of convolutional neural networks for classification model, and analyzes the construction ideas of each model at different stages. Secondly, the paper reviews relevant examples of Transformer combined with convolutional neural networks as well as the application of each model in other fields. Finally, the possible development directions of convolutional neural networks are discussed.
    Reference | Related Articles | Metrics
    Research Review of UAV Recognition Based on Multi-Modal Fusion
    LI Minshu, ZHOU Mohan, ZHI Ruicong
    Computer Engineering and Applications    2025, 61 (21): 1-14.   DOI: 10.3778/j.issn.1002-8331.2501-0014
    Abstract355)      PDF(pc) (17525KB)(415)       Save
    With the rapid development of UAV technology, the application of related technologies is increasing, but it also brings many security risks and regulatory problems. As an important means to deal with these challenges, Anti-UAV detection technology has received extensive attention. Traditional UAV detection methods rely on a single modal data, such as vision, audio, radar and radio frequency signals, etc. However, these single modal data can obtain limited information in complex scenes. In recent years, deep learning methods have made good progress in the field of small object detection, and the related research of multi-modal fusion technology has further improved the accuracy and robustness of object detection. This paper reviews the research progress in the field of UAV detection, and focuses on summarizing the current research status of multimodal fusion technology. In addition, the evaluation indicators and public datasets of related UAV detection are sorted out, the limitations of existing technologies are analyzed, and the research directions for improving detection accuracy and robustness in the future are also pointed out.
    Reference | Related Articles | Metrics
    Research Advance of Crack Detection for Infrastructure Surfaces Based on Deep Learning
    HU Xiangkun, LI Hua, FENG Yixiong, QIAN Songrong, LI Jian, LI Shaobo
    Computer Engineering and Applications    2025, 61 (1): 1-23.   DOI: 10.3778/j.issn.1002-8331.2407-0407
    Abstract391)      PDF(pc) (9136KB)(376)       Save
    Civil infrastructure is prone to changes in physical or performance after long-term use, and causing certain damage to the function and service safety. So it is essential to monitor structure healthy of such facilities. Crack detection is an extremely important part of structure healthy monitoring. Timely detection and identification of such damage can effectively avoid severe accidents. Crack detection methods based on computer vision are simple, fast and accurate, and are widely used for surface crack detection in civil infrastructures. This paper reviews crack detection methods for infrastructure surfaces based on deep learning from three different detection directions: image classification, object detection, and semantic segmentation. And common data collection methods and commonly used public crack datasets are summarized. Finally, the difficulties and challenges of deep learning-based surface crack detection methods for infrastructures are discussed, and possible future development directions are envisioned.
    Reference | Related Articles | Metrics
    Review of Multi-Modal Driver Emotion Recognition
    ZHOU Xinying, LI Leixiao, LIN Hao, ZHANG Hucheng
    Computer Engineering and Applications    2025, 61 (10): 1-18.   DOI: 10.3778/j.issn.1002-8331.2410-0153
    Abstract327)      PDF(pc) (1630KB)(365)       Save
    Accurately identifying driver emotions can effectively prevent potential dangerous driving behaviors and reduce the occurrence of traffic accidents. It is an important technology to improve road safety and driving experience. With the progress of artificial intelligence and multi-modal data processing technology, emotion recognition technology has gradually developed from a single-modal approach to a multi-modal approach. This paper reviews the current research progress of multi-modal driver emotion recognition, and focuses on the recognition process of facial expression, voice signal, physiological signal and vehicle behavior. The key steps include data preprocessing, feature extraction and multi-modal fusion. By analyzing the existing research, the advantages and disadvantages of different methods are summarized, and several driver emotion-related datasets are introduced. Finally, combined with the current research challenges, five research directions in the field of multi-modal driver emotion recognition in the future are proposed.
    Reference | Related Articles | Metrics
    Status and Challenges of Large Language Models Applications in Vertical Domains
    JI Xinmeng, ZAN Hongying, CUI Tingting, ZHANG Kunli
    Computer Engineering and Applications    2025, 61 (12): 1-11.   DOI: 10.3778/j.issn.1002-8331.2409-0181
    Abstract455)      PDF(pc) (839KB)(360)       Save
    In recent years, large language models, exemplified by ChatGPT, have garnered significant attention across various fields and demonstrated outstanding performance, fueling a new wave of advancements in artificial intelligence technology. At present, there are over a hundred domestic large language models, spanning multiple industry sectors, with their applications continuously expanding. To better address the development of large language models in natural language processing and their impact on both general tasks and specialized domain applications, this paper reviews the evolution of natural language processing and large language models. It provides an overview of current large model technologies and their applications in vertical domains such as healthcare, law, and finance. Furthermore, it analyzes the challenges faced by large models during deployment, such as limitations in capabilities and collaboration difficulties. Lastly, the paper discusses the future research directions aimed at addressing these issues and enhancing the practical application of large language models.
    Reference | Related Articles | Metrics
    DMF-YOLOv11:Target Detection Algorithm for UAV Images Based on Improved YOLOv11n
    HE Zhixuan, CHEN Lili, WANG Xiang, LI Ronghua
    Computer Engineering and Applications    2025, 61 (14): 88-100.   DOI: 10.3778/j.issn.1002-8331.2502-0223
    Abstract562)      PDF(pc) (1893KB)(343)       Save
    To address the insufficient detection accuracy caused by dense small-sized targets, significant multi-scale variations, and complex scene interference in drone aerial image target detection, this paper proposes an improved YOLOv11n-based algorithm named DMF-YOLOv11. Firstly, a dual bidirectional auxiliary feature pyramid network (DBAFPN) is designed as the Neck structure to enhance feature representation for extremely small and regular small targets through multi-level bidirectional feature fusion. Secondly, a multi-branch hybrid convolution (MBHConv) module is constructed to improve sensitivity toward small-scale targets using parallel heterogeneous convolutional paths. Finally, the self-modulating feature aggregation network (SMFANet) is deeply integrated with the backbone C3K2 module, proposing the C3K2_FMB block to collaboratively extract local details and non-global contextual features. Experiments on the VisDrone2019 dataset demonstrate that DMF-YOLOv11 achieves mAP50 and mAP50-95 scores of 46.2% and 28.4%, respectively, surpassing the baseline YOLOv11n by 11.5 and 8.3 percentage points. The recall rate increases by 9.4 percentage points to 44.6%. The improved algorithm effectively enhances target detection accuracy in drone aerial scenarios.
    Reference | Related Articles | Metrics
    Review on Development of Spatio-Temporal Graph Neural Networks for Traffic Flow Prediction
    YAN Jiahe, LI Honghui, SUN Jing, LIU Jie, ZHANG Junwen, YANG Xiaorui, XU Yi
    Computer Engineering and Applications    2025, 61 (22): 1-19.   DOI: 10.3778/j.issn.1002-8331.2502-0225
    Abstract464)      PDF(pc) (1418KB)(336)       Save
    In recent years, the application of deep learning in traffic flow prediction has attracted wide attention, especially the spatio-temporal graph neural network has achieved remarkable success in capturing spatio-temporal dependencies and predicting traffic characteristics. Although some reviews have examined the application of spatio-temporal graph neural networks, most of these studies focus primarily on application scenarios and fail to provide an in-depth analysis from the perspective of model design. Furthermore, a unified model classification framework is absent. This paper proposes a hierarchical classification method that considers the key elements such as module selection, fusion mechanism, architecture design, and training strategy. The spatio-temporal graph neural networks can be divided into six categories, namely recurrent graph convolutional network, spatio-temporal fully convolutional network, spatio-temporal attention network, spatio-temporal encoder network, spatio-temporal hybrid architecture network, and spatio-temporal networks with additional strategies. For each category, the unique model construction and fusion mechanisms are analyzed in detail, and the different model variants are compared. By analyzing both representative and recent works, the development trend of spatio-temporal graph neural networks is discussed, and the code addresses of open-source models are provided. Subsequently, the commonly used public datasets are gathered, and the performance of the latest advanced models is visually analyzed by comparing the results of previous experiments. Finally, the development opportunities and challenges in this field are summarized to offer insights for future research.
    Reference | Related Articles | Metrics
    Research Progress on Multi-Agent Deep Reinforcement Learning and Scalability
    LIU Yanfei, LI Chao, WANG Zhong, WANG Jieling
    Computer Engineering and Applications    2025, 61 (4): 1-24.   DOI: 10.3778/j.issn.1002-8331.2407-0034
    Abstract369)      PDF(pc) (2161KB)(333)       Save
    Multi-agent deep reinforcement learning has shown great potential in solving agent collaboration, competition, and communication problems in recent years. However, as its application expands across more domains, scalability has become a focal concern, which is an important problem from theoretical research to large-scale engineering applications. This paper reviews the reinforcement learning theory and typical algorithms of deep reinforcement learning, introduces three learning paradigms of multi-agent deep reinforcement learning and their representative algorithms, and briefly summarizes the current mainstream open-source experimental platforms. Then, this paper delves into the research progress on the scalability of the number and scenarios in multi-agent deep reinforcement learning, analyzes the main problems faced by each method and providing existing solutions. Finally, the application prospect and development trend of multi-agent deep reinforcement learning are prospected, providing references and inspiration to further advance research in this field.
    Reference | Related Articles | Metrics
    Improved RT-DETR Algorithm for Aerial Small Object Detection
    LIU Siyuan, GAO Kai, YONG Longquan
    Computer Engineering and Applications    2025, 61 (4): 272-281.   DOI: 10.3778/j.issn.1002-8331.2407-0399
    Abstract323)      PDF(pc) (1975KB)(326)       Save
    Aiming to address the issue of missed and false detection of small objects in aerial photography images by existing object detection algorithms, an improved algorithm based on RT-DETR (real-time detection transformer) is proposed. Partial convolution (PConv) is introduced into the backbone network, and a PConvBlock structure is designed. Then, a BasicBlock-PConvBlock module composed of PConvBlocks replaces the original BasicBlock, effectively reducing the number of model parameters. The bidirectional feature pyramid network (BiFPN) structure is adopted to optimize the feature fusion module. The S2 feature is introduced to enhance the detection ability of small objects. The CARAFE upsampling operator is introduced to strengthen the fast fusion of multi-scale features. Experimental results show that the improved model has a 13.9% reduction in parameter number compared to the RT-DETR model, and the mAP0.5 and mAP0.5:0.95 indicators are improved by 2.4 and 1.9 percentage points, respectively on the VisDrone test set. On the TT100K and DOTA datasets, the improved model outperforms the RT-DETR algorithm. The improved model significantly enhances detection accuracy while maintaining a smaller parameter number and computational cost, meeting the real-time detection application requirements for drone aerial photography images.
    Reference | Related Articles | Metrics
    Small Object Detection in Aerial Imagery Using Multi-Scale Hiearchical Feature Fusion Based Approach
    YANG Hongdan, FU Gui, SHAO Huichao, WANG Yixin, SHAO Yanhua, CHU Hongyu, DENG Hu
    Computer Engineering and Applications    2025, 61 (9): 230-241.   DOI: 10.3778/j.issn.1002-8331.2408-0105
    Abstract342)      PDF(pc) (4334KB)(288)       Save
    Aiming at the problem of low accuracy in detecting small objects due to large field of view, small object size, and dense distribution in aerial images, a multi-scale feature fusion aerial detection algorithm based on improved YOLOv8 is proposed. Firstly, a lightweight L-MobileViT module is constructed to capture effective relationships between features, mitigate information loss, and improve the detection performance of the model. Secondly, a hierarchical multi-scale fusion module HF (hierarchical fusion) is proposed to integrate deep spatial information and shallow semantic information, enhancing the detection capability of small objects in dense scenes. Finally, a tiny detection head is added, and a large detection head is removed based on YOLOv8 to focus on the detection ability of small objects and reduce the missed detection rate of small objects. Experimental results show that the improved model achieves superior performance on the VisDrone2019 and UAV-TrafficTinyDataset datasets, with mAP@50 increased by 17.6% and 15.7%, respectively, compared to the baseline model. The detection effect of small objects is significantly improved, and the overall performance is better than mainstream aerial detection algorithms. This demonstrates that the improved algorithm has better generalization and robustness, making it suitable for detection tasks in aerial scenarios.
    Reference | Related Articles | Metrics
    Survey of Adversarial Attacks for Single Object Tracking
    LU Zhengzhi, HUANG Xichen, PENG Bo
    Computer Engineering and Applications    2025, 61 (16): 1-15.   DOI: 10.3778/j.issn.1002-8331.2410-0308
    Abstract237)      PDF(pc) (5480KB)(282)       Save
    Single object tracking (SOT) is one of the key tasks in computer vision. With the advancement of artificial intelligence, tracking methods based on deep learning have become the mainstream approach for SOT, significantly improving the performance of tracking systems. However, deep learning methods are vulnerable to adversarial attacks, where attackers induce tracking errors in deep tracking models, severely impacting the precision and robustness of tracking. This paper reviews the development of adversarial attack techniques for SOT in recent years, uncovering the security threats and analyzing the challenges encountered in adversarial attack techniques for SOT. Furthermore, this paper categorizes the existing adversarial attack techniques for SOT based on whether the attack methods align with the online characteristics of video tracking, summarizing their fundamental principles, characteristics, and representative works. Finally, from the perspectives of constructing secure and reliable tracking models and targeting practical applications of tracking attacks, this paper provides an outlook on the future development trends of tracking adversarial technologies. It delves into the pivotal issues in current research on tracking attacks, encompassing tracking adversarial defense, multimodal tracking attacks, physically realizable tracking attacks, and non-cooperative attacks, with the aim of fostering innovation and advancement in this field.
    Reference | Related Articles | Metrics
    Improved Target Detection Algorithm for UAV Images with RT-DETR
    JIANG Maoxiang, SI Zhanjun, WANG Xiaozhe
    Computer Engineering and Applications    2025, 61 (1): 98-108.   DOI: 10.3778/j.issn.1002-8331.2405-0331
    Abstract296)      PDF(pc) (5878KB)(266)       Save
    This paper proposes an improved RT-DETR algorithm for unmanned aerial vehicle (UAV) target detection in light and small-sized UAV image targets. Addressing issues such as low detection accuracy due to the flexible and diverse nature of targets and complex and variable environments, the proposed method enhances the feature extraction capability of the detection model by integrating lightweight SimAM attention and inverted residual modules into the ResNet-r18 backbone network. Furthermore, a cascaded group attention mechanism is employed to optimize the inverted residual modules and feature interaction modules, improving feature selection capability and achieving refined acquisition of target detection information. Additionally, a 160×160 detection layer is introduced in the neck network to enhance the perception capability of small targets during the feature fusion stage. Finally, the experimental results based on the VisDrone2019 dataset show that the improved model has lower number of parameters and higher detection accuracy. Further experiments on the Alver_Lab_Ulastirma and HIT-UAV datasets validate the effectiveness and robustness of the proposed improvements.
    Reference | Related Articles | Metrics
    Review of Infrared and Visible Image Fusion Methods in Deep Learning Frameworks
    LI Shuhui, CAI Wei, WANG Xin, GAO Weijie, DI Xingyu
    Computer Engineering and Applications    2025, 61 (9): 25-40.   DOI: 10.3778/j.issn.1002-8331.2410-0012
    Abstract265)      PDF(pc) (25398KB)(266)       Save
    Infrared and visible image fusion (IVIF) fuses complementary information from infrared and visible images to improve image quality and support downstream tasks. Due to the advantages of deep learning in image fusion, its application in IVIF field has become a research hotspot. The infrared and visible image fusion methods deep learning-based are summarized and analyzed. According to different fusion frameworks, the fusion methods are divided into four categories: autoencoder-based, convolutional neural network-based, generative adversarial network-based and Transformer-based. Moreover, the characteristics of each IVIF methods are compared and analyzed. The main application in IVIF fields, 6 common datasets and 8 evaluation metrics are reviewed,and qualitative evaluation and quantitative evaluation of various mainstream IVIF methods on typical datasets are carried out. Finally, the limitations of existing IVIF methods are summarized, and the future IVIF research directions are prospected.
    Reference | Related Articles | Metrics
    Embedded Detection Method of Coating Surface Defects Based on YOLOv4-tiny-SR
    ZHAO Hui, HOU Xutao, SONG Long, XU Ke, SHA Jianjun, CHEN Zongyang
    Computer Engineering and Applications    2025, 61 (8): 239-249.   DOI: 10.3778/j.issn.1002-8331.2312-0180
    Abstract230)      PDF(pc) (4597KB)(254)       Save
    A detection method for coating surface defects is proposed to solve the problems of low detection accuracy, slow speed and high requirements for hardware configuration in the process of coating surface defect embedded detection. The YOLOv4-tiny-SR uses a new model block DSRBlock. The local structure of the model block can greatly reduce memory consumption and increase detection speed while ensuring detection accuracy. A geometric average clustering method is proposed, which converts the update method of cluster centers from arithmetic average to geometric average to avoid the deviation of cluster centers to the large target frame. At the same time, for difficult-to-detect samples, a hard sample loss function is designed to increase the learning intensity of the network and improve the detection effect. The comparison experiment results based on the measured data of coating surface defects show that the method in this paper has obvious advantages compared with other methods in terms of parameter quantity, model size, detection speed and accuracy. Compared with the current mainstream YOLOv4-tiny, the parameter amount is reduced by 51.82%, the model size is reduced by 46%, the speed is increased by 39.47%, and the accuracy is also increased by 1.25?percentage points. The method in this paper has faster detection speed, higher detection accuracy, and less memory consumption, it has high practical value for real-time detection of surface defects on embedded devices for industrial applications, and can be popularized and applied to related fields.
    Reference | Related Articles | Metrics
    Small Object Detection Algorithm for UAV Images Based on Improved YOLOv8
    HOU Ying, WU Yan, KOU Xurui, HUANG Jiachao, TUO Jindou, WANG Yuqi, HUANG Xiaojun
    Computer Engineering and Applications    2025, 61 (11): 83-92.   DOI: 10.3778/j.issn.1002-8331.2411-0214
    Abstract226)      PDF(pc) (1986KB)(254)       Save
    Unmanned aerial vehicle (UAV) images have a large number of densely distributed small targets, which easily cause the problems of small target missed detection and false detection. Therefore, an improved YOLOv8 small target detection algorithm for UAV images is proposed. Firstly, by utilizing high-resolution shallow feature information with smaller receptive fields and finer spatial information features, a small object detection head is added and four feature extraction heads are used to improve the small object detection rate. Secondly, a small object detection module group with ConvSPD convolution module and BiFormer attention enhancement module is designed to improve the YOLOv8 backbone network, which effectively enhances the ability to capture shallow detail feature information of small objects. Subsequently, to meet the hardware deployment requirements of the model, a reparameterizable Rep-PAN model is adopted to optimize the Neck network. Finally, in order to improve the positioning accuracy, the Focaler-CIoU loss function with target size adaptive penalty factor is adopted in the Head network to optimize the regression positioning loss. On the VisDrone-2019 dataset, the improved algorithm obtains 51.2% average detection accuracy and is 10.9 percentage point higher than YOLOv8. In addition, its detection frame rate achieves 63.7 FPS, and it has good real-time performance.
    Reference | Related Articles | Metrics
    Survey on Lane Line Detection Techniques for Classifying Semantic Information Processing Modalities
    HONG Shuying, ZHANG Donglin
    Computer Engineering and Applications    2025, 61 (5): 1-17.   DOI: 10.3778/j.issn.1002-8331.2406-0160
    Abstract245)      PDF(pc) (2981KB)(251)       Save
    With the rapid development of autonomous driving technology, lane line detection, as its key component, has attracted widespread attention and shown great potential for application in intelligent transportation systems. However, traditional lane line detection techniques usually struggle to provide satisfactory recognition accuracy when dealing with complex environmental challenges. This paper reviews the development of lane detection technology and systematically sorts out 84 advanced algorithms, and innovatively divides them into four categories based on semantic processing: semantic segmentation assistance, semantic information fusion, semantic information enhancement, and semantic relationship mode-
    ling. By deeply analyzing the technical characteristics and advantages of these algorithms, the main limitations of current lane line detection technology are revealed. Finally, the future development direction of lane line detection technology is put forward, especially in the utilization of semantic information, and the potential research direction is pointed out.
    Reference | Related Articles | Metrics
    Implementation of Meteorological Database Question-Answering Based on Large-Scale Model Retrieval-Augmentation Generation
    JIANG Shuangwu, ZHANG Jiawei, HUA Liansheng, YANG Jinglin
    Computer Engineering and Applications    2025, 61 (5): 113-121.   DOI: 10.3778/j.issn.1002-8331.2406-0230
    Abstract231)      PDF(pc) (1198KB)(249)       Save
    With the increasing demand for information retrieval and knowledge acquisition, question-answering systems are widely applied across various domains. However, there is a notable lack of specialized question-answering system research in the meteorological field, which severely limits the efficient utilization of meteorological information and the service efficiency of meteorological systems. To address this gap, it proposes a retrieval-augmented generation based question-answering implementation scheme for meteorological databases. This scheme designs a multi-channel query routing (McRR) method based on relational databases (SQL) and document-oriented data (NoSQL). Additionally, to adapt large model queries to databases and enhance the model’s understanding of query tables, the paper proposes an instruction query conversion method and a database table summarization method (termed as DNSUM) to improve the model’s semantic understanding of databases. Furthermore, by integrating key modules such as question understanding, re-rankers, and response generation, it constructs an end-to-end intelligent question-answering engine capable of retrieving relevant knowledge and generating answers from multiple data sources. Experimental results on the constructed meteorological question-answering dataset demonstrate that this engine effectively understands user queries and generates accurate answers, exhibiting strong retrieval and response capabilities. This research not only provides a question-answering solution for the meteorological field but also offers new directions for the application of question-answering technology in vertical domains.
    Reference | Related Articles | Metrics
    Review of Generative Image Detection Technology Based on Diffusion
    CHENG Boxuan, LI Mingxuan, ZHANG Zhengyu
    Computer Engineering and Applications    2025, 61 (20): 1-18.   DOI: 10.3778/j.issn.1002-8331.2508-0156
    Abstract215)      PDF(pc) (3310KB)(249)       Save
    Diffusion model is a model that realizes content generation through forward diffusion and backward denoising. It has been widely used in the fields of target detection, medical images, natural language processing and generative images. With the expansion of the application scope, characterizing the authenticity of the generated images has become a hotspot of academic research. However, the diffusion model generative image technology is used to produce false news images or pornographic images to spread rumors, etc., and it is widely used in the gray area and even in the field of illegal crimes. In recent years, a large number of research works have been used to solve the authenticity problem of diffusion model-generated images, but the existing works lack the systematic research and combing of its generated image detection. In order to fill the above gap, the research and development of diffusion model-generated image detection technology is comprehensively analyzed. This paper outlines the overall process and related steps of ten diffusion model-generated image technologies, and studies the advantages and disadvantages of diffusion model and other image generation models. It systematically sorts out five types of diffusion model detection technologies, discusses the applications and challenges of the detection technologies, and compares and analyzes the five types of detection technologies. It summarizes 22 types of diffusion model datasets, and makes a systematic comparison. According to the limitations of the diffusion model generative image detection technology, the future development direction of the detection technology is discussed.
    Reference | Related Articles | Metrics
    Survey of Medical Image Segmentation in Deep Learning
    XING Suxia, LI Kexian, FANG Junze, GUO Zheng, ZHAO Shihang
    Computer Engineering and Applications    2025, 61 (7): 25-41.   DOI: 10.3778/j.issn.1002-8331.2409-0142
    Abstract379)      PDF(pc) (1527KB)(248)       Save
    In response to the high-dimensional, complex nature and high-precision demands of medical images, deep learning-based segmentation methods excel in feature extraction and complex pattern recognition. These methods adaptively learn and extract multi-level features from vast datasets, demonstrating high accuracy, robustness, and scalability. By end-to-end extraction of organs, tissues, or lesion areas of interest, they provide substantial assistance to physicians in disease diagnosis, treatment planning, and clinical research. This review focuses on the application and development trajectory of U-Net, Transformer, Mamba, and segment anything model (SAM) and their variants in medical image segmentation, offering a comprehensive comparative analysis across multiple dimensions. It holds reference value for medical imaging research, clinical diagnosis and treatment decision-making, and the development of innovative medical technology products. Building on this, the review summarizes current challenges in medical image segmentation research and prospects the future research landscape.
    Reference | Related Articles | Metrics
    Review of Development of Visual-Inertial Joint Calibration
    ZHAO Junyang, LYU Shenhua, LI Yongxu, ZHU Huixin, ZHANG Kefan
    Computer Engineering and Applications    2025, 61 (8): 1-16.   DOI: 10.3778/j.issn.1002-8331.2409-0330
    Abstract250)      PDF(pc) (1197KB)(247)       Save
    The joint use of cameras and IMU (inertial measurement unit) can fully leverage the complementary advantages of two sensors, enabling data fusion and mutual calibration. In recent years, a variety of intelligent joint calibration methods have emerged, however, there is a lack of unified summarization and analysis. Therefore, the visual-inertial joint calibration methods are classified and sorted in a unified way to analyze the application characteristics and limitations of various approaches, and provide a better choice foundation for the application or research of camera and IMU joint calibration methods. Firstly, this paper introduces the calibration parameters and principles for both the camera and IMU, discussing these from temporal and spatial perspectives. Secondly, it classifies and comparatively analyzes online and offline temporal calibration methods. From a spatial perspective, the paper categorizes calibration methods based on the distinct principles of IMU and camera calibration into four types: optimization-based calibration, decoupled model-based calibration, filtering-based calibration, and machine learning-based calibration, while evaluating the advantages and characteristics of each approach. Finally, to summarize the entire paper, it proposes the future development trends of joint calibration: spatiotemporal unified calibration, a greater variety of calibration toolkits, the expansion of machine learning applications, and multi-sensor joint calibration, among others.
    Reference | Related Articles | Metrics
    Review of Multi-UAV Collaborative Planning Research
    NING Cong, FAN Jing, SUN Shukui
    Computer Engineering and Applications    2025, 61 (1): 42-58.   DOI: 10.3778/j.issn.1002-8331.2405-0110
    Abstract383)      PDF(pc) (6173KB)(245)       Save
    Unmanned aerial vehicles (UAVs) play an important role in various industries, and the cooperation between multiple UAVs has become a research hotspot. Aiming at the two core problems of task assignment and path planning, firstly, the complexity between the two key problems of multi-UAV cooperative planning and the information coupling factors between the sub-problems are sorted out and analyzed, and the decoupling strategy is focused on. Secondly, the general model of multi-UAV cooperative planning problem is described in terms of mathematical model, and the common environment modeling methods and constraints of multi-objective optimization solution are sorted out and summarized. Thirdly, the task planning methods based on centralized and distributed control, and the application and research progress of heuristic algorithms in multi-UAV cooperative planning are summarized, and the cooperative planning methods under the real-time requirements of the multi-UAV cooperative planning problem are also emphasized. Lastly, in conjunction with the typical research, the future research methods and challenges of the multi-UAV cooperative planning problem are discussed, and the development of the multi-UAV cooperative planning is anticipated.
    Reference | Related Articles | Metrics
    Review of Application of BEV Perceptual Learning in Autonomous Driving
    HUANG Deqi, HUANG Haifeng, HUANG Deyi, LIU Zhenhang
    Computer Engineering and Applications    2025, 61 (6): 1-21.   DOI: 10.3778/j.issn.1002-8331.2407-0501
    Abstract286)      PDF(pc) (2079KB)(243)       Save
    As the types of sensors used as acquisition inputs in the autonomous driving perception module continue to develop, it becomes more and more difficult to represent the multi-modal data uniformly. BEV perception learning in the automatic driving perception task module can make multi-modal data unified integration into a feature space, which has better development potential compared with other perception learning models. The reasons for the good development potential of BEV perception model are summarized from five aspects: research significance, spatial deployment, preparation work, algorithm development, and evaluation index. The BEV perception model can be summarized into four series from a framework perspective: Lift-Splat-Lss series, IPM reverse perspective conversion, MLP view conversion and Transformer view conversion. The input data can be summarized into two categories: the first type of pure image feature input includes monocular camera input and multi-camera input; the second type of fusion data input is not only the simple data fusion of point cloud data and image features, but also the knowledge distillation fusion guided or supervised by point cloud data and the fusion of height segmentation by guided slice. It provides an overview of the application of four kinds of automatic driving tasks in BEV perception model, such as multi-target tracking, map segmentation, lane detection and 3D target detection, and summarizes the shortcomings of the four series of current BEV perception learning frameworks.
    Reference | Related Articles | Metrics
    DCD-YOLOv8n:Efficient Algorithm for Steel Surface Defect Detection
    LIANG Liming, CHEN Kangquan, ZHONG Yi, LONG Pengwei, FENG Yao
    Computer Engineering and Applications    2025, 61 (7): 117-127.   DOI: 10.3778/j.issn.1002-8331.2409-0248
    Abstract264)      PDF(pc) (2671KB)(241)       Save
    To address the issues of high resource consumption, and low detection accuracy and efficiency in existing steel surface defect detection algorithms, a high-efficiency steel defect detection algorithm based on YOLOv8n, named DCD-YOLOv8n, is proposed. Firstly, a lightweight diverse branch block efficient layer aggregation network is designed to effectively reduce model size and enhance detection speed. Secondly, a cross-dimensional aggregation module is utilized to model multi-dimensional features via adaptive mechanisms, improving detection accuracy. Finally, a deformable multi-head attention mechanism is introduced to dynamically adjust the shape and scope of attention, effectively handling defects with diverse shapes and complex structures, and thus enhancing detection performance. Experimental validation on the Severstal and NEU-DET steel defect datasets shows that, compared to the YOLOv8n algorithm, the DCD-YOLOv8n algorithm achieves improvements in mAP of 2.4% and 1.9% respectively, reduces in parameters and computations of 0.5×106 and 1.9×109 respectively, and increases in FPS of 22 and 7 frames respectively. The experimental results demonstrate that the algorithm excels in balancing computational cost, detection accuracy, and efficiency, offering significant practical deployment value.
    Reference | Related Articles | Metrics
    Survey of Link Prediction in Knowledge Graph Embedding Methods
    LIU Haichao, LIU Lin, WANG Hailong, ZHAO Weiwei, LIU Jing
    Computer Engineering and Applications    2025, 61 (8): 17-34.   DOI: 10.3778/j.issn.1002-8331.2407-0158
    Abstract228)      PDF(pc) (1109KB)(241)       Save
    Knowledge graphs often suffer from issues such as missing entities and relationships. Knowledge graph completion, which addresses these deficiencies, has garnered significant attention from researchers. Link prediction based on knowledge graph embedding, as an important research direction for knowledge graph completion, can predict missing entities or relationships in the knowledge graph, thereby enhancing its completeness. Firstly, this paper expounds the research background, significance and definition of link prediction in knowledge graph. Secondly, based on the number of entities in the embedding unit, the link prediction models for knowledge graph embedding are divided into two-entity embedding link prediction models and multi-entity embedding link prediction models. The idea of model construction is elaborated, the experimental results are analyzed, and the advantages and disadvantages of various models are summarized. Finally, the current status and future research directions of knowledge graph embedded link prediction are prospected to provide inspiration and guidance for subsequent development.
    Reference | Related Articles | Metrics
    Review of Lung CT Image Lesion Region Segmentation Based on Deep Learning
    LI Xiaotong, MA Sufen, SHENG Hui, WEI Guohui, LI Xintong
    Computer Engineering and Applications    2025, 61 (4): 25-42.   DOI: 10.3778/j.issn.1002-8331.2403-0315
    Abstract265)      PDF(pc) (4394KB)(238)       Save
    Lung cancer poses a serious threat to people’s lives and health. The morphology of lesion areas in lung CT images is complex and diverse, and achieving high-precision segmentation of lesion areas in lung CT images has become a highly challenging key issue in the field of computer-aided diagnosis. The segmentation of lung lesion regions based on deep learning not only helps doctors diagnose early lung cancer quickly and accurately, but also has important clinical value for the treatment of lung cancer. In order to conduct in-depth research on lung lesion segmentation techniques, common datasets and evaluation indicators are introduced. The deep learning lung lesion regions segmentation models are reviewed in three aspects:segmentation model based on convolutional neural network, segmentation model based on U-Net model, and segmentation model based on generative adversarial network. The innovative points of domestic and foreign research over the past 5 years are summarized through specific experiments. The segmentation performance of various models is compared and analyzed. The advantages and disadvantages of various models are summarized, and the development direction in this field is discussed.
    Reference | Related Articles | Metrics
    Overview of Multi-View 3D Reconstruction Techniques in Deep Learning
    WANG Wenju, TANG Bang, GU Zehua, WANG Sen
    Computer Engineering and Applications    2025, 61 (6): 22-35.   DOI: 10.3778/j.issn.1002-8331.2405-0328
    Abstract262)      PDF(pc) (3077KB)(225)       Save
    In order to solve the problems that classic multi-view 3D reconstruction methods are difficult to reconstruct complex objects and have poor reconstruction results, and to extend to high resolution, deep learning methods are introduced to reconstruct 3D models with higher accuracy. Thus multi-view 3D reconstruction algorithm using deep learning methods are systematically summarized, analyzed and compared, and the multi-view 3D reconstruction algorithms in recent years are classified and sorted out according to explicit geometry and implicit geometry representations. Neural implicit 3D reconstruction algorithms that combines implicit functions and volume rendering are mainly introduced, which currently have a high accuracy in reconstruction results, and the quantitative and qualitative analyses are conducted on some of these algorithms. In addition, commonly used datasets and evaluation indicators are listed, and the future research trends and development directions are discussed.
    Reference | Related Articles | Metrics
    MBFE-DETR: Multi-Scale Boundary Feature Enhancement for Drone Target Detection Algorithm
    ZHANG Xi, LAI Huicheng, JIANG Di, TANG Jingwen, GAO Guxue, YUAN Tingting, NIE Yuan
    Computer Engineering and Applications    2025, 61 (17): 89-101.   DOI: 10.3778/j.issn.1002-8331.2503-0307
    Abstract157)      PDF(pc) (5562KB)(223)       Save
    Aiming at problems such as complex backgrounds, high proportion of small targets, and sample imbalance in drone perspective views, an improved drone object detection algorithm based on RT-DETR called MBFE-DETR is proposed. Firstly, a lightweight backbone network based on C2f and single-head self-attention modules is designed, reducing model parameters while enhancing feature extraction capabilities. Secondly, a multi-scale boundary feature enhancement collaborative network (MBFECN) is proposed, which addresses the original model’s deficiencies in preserving small target boundary details through its unique multi-scale boundary feature enhancement mechanism and efficient feature fusion strategy. Then, Focaler-MPDIoU is introduced to consider the positional matching relationship between bounding boxes, while reconstructing the original IoU loss through linear interval mapping, resulting in better localization performance in complex scenes. Finally, to address sample imbalance, a new classification loss function called ESVLoss is adopted, which applies segmented weighted adjustments to classification loss values and combines an exponential moving average mechanism to dynamically update weights smoothly, making the model more adaptive. Experimental results show that on the VisDrone2019-DET and DOTAv1.0 datasets, the MBFE-DETR algorithm improves mAP50 by 3.9 and 2.9 percentage points respectively, while reducing parameters by 21.6%.
    Reference | Related Articles | Metrics
    Survey of Collaborative Symbiosis Mode Between Knowledge Graph and Large Language Model and Its Education Application
    LI Xiaoli, LIU Chunfang, GENG Shaokun
    Computer Engineering and Applications    2025, 61 (15): 1-13.   DOI: 10.3778/j.issn.1002-8331.2410-0481
    Abstract244)      PDF(pc) (1158KB)(222)       Save
    In recent years, the rapid development of artificial intelligence technology, especially large language models and knowledge graph technology, has provided important technical conditions for the digital and intelligent transformation of education. Firstly, the application advantages, current status, and disadvantages of large language models and knowledge graphs in the field of intelligent education are analyzed separately. On this basis, the collaborative symbiosis mode between knowledge graph and large language model is discussed in depth, including the ways and means of mutual enhancement between them. Inductive analysis is conducted on the research status of collaborative technology and its relevant applications in the field of education are summarized. Finally, the development trend of the joint application of knowledge graph and large language model in the field of education are summarized and prospected.
    Reference | Related Articles | Metrics
    Review of Collaborative Inference Methods for Edge Intelligence
    ZHAO Chanchan, LYU Fei, SHI Bao, YU Xiaomin, YANG Xingchen, YUE Xiaocan
    Computer Engineering and Applications    2025, 61 (3): 1-20.   DOI: 10.3778/j.issn.1002-8331.2406-0040
    Abstract261)      PDF(pc) (7788KB)(219)       Save
    With the development of edge intelligence, collaborative inference technology has made significant progress in enhancing the efficiency and performance of intelligent applications through collaboration among cloud, edge, and terminal devices. The performance metrics, application scenarios, and challenges of edge intelligence are outlined, introducing four inference paradigms under collaborative inference technology through the rating architecture of edge intelligence: end-to-end collaboration, edge-to-end collaboration, edge-to-edge collaboration, and cloud-edge-end collaboration. Based on the limitations and differences of application scenarios for collaborative inference technology, the advantages, limitations, principles, and optimization goals of collaborative inference technology in different inference paradigms are comprehensively analyzed and compared. The discussion delves into issues such as computational resource allocation, inference latency optimization, and throughput optimization solved by collaborative inference technology in different application scenarios. It also points out challenges in privacy security, communication service resource management, and collaborative training within edge intelligence. Future development trends and research directions are discussed, providing references and insights for research in this field.
    Reference | Related Articles | Metrics
    Research Progress of Vehicle Re-Identification Based on Deep Learning
    PING Can, LI Leixiao, LIU Dongjiang, LIN Hao, SHI Jianping
    Computer Engineering and Applications    2025, 61 (16): 16-37.   DOI: 10.3778/j.issn.1002-8331.2409-0384
    Abstract190)      PDF(pc) (6868KB)(217)       Save
    With the growing demand for vehicle re-identification in intelligent surveillance and public safety, deep learning-based methods have become a research hotspot due to their powerful image processing capabilities. Traditional handcrafted feature methods can no longer meet the needs of modern vehicle re-identification facing massive data processing. This paper summarizes the current research on vehicle re-identification based on deep learning. It categorizes existing methods into representation learning and cross-domain learning based on data input sources. Representation learning focuses on the extraction and fusion of global and auxiliary features, while cross-domain learning addresses adaptability issues between different domains. It reviews key technologies of various methods and discusses their advantages and limitations. Future research directions are explored, advancements in vehicle re-identification accuracy and robustness are proposed through multimodal data fusion, unsupervised learning methods, and large language models.
    Reference | Related Articles | Metrics
    Road Object Detection Algorithm for Outdoor Blind Navigation Scenariosc
    LI Ming, HE Zhiqi, DANG Qingxia, ZHU Shengli
    Computer Engineering and Applications    2025, 61 (9): 242-254.   DOI: 10.3778/j.issn.1002-8331.2406-0387
    Abstract226)      PDF(pc) (2972KB)(216)       Save
    To address the challenges of complex background interference and the need for key semantic information in road object detection for outdoor blind navigation, as well as the low accuracy and frequent missed detections of current models, a road object detection algorithm named OD-YOLO is proposed, based on YOLOv8n. The backbone network utilizes FasterNet to enhance feature extraction. In the SPPF module, a large separable kernel attention (LSKA) mechanism is introduced to improve the perception of road object. The GAC2f module is designed to reduce computational load while enhancing feature capture capability. By optimizing GAC2f with structural reparameterization from the diverse branch block (DBB), multiple features are fused without sacrificing performance, significantly improving accuracy. Additionally, the LarK large kernel convolution, optimized with the convolutional gated linear unit (Convolutional GLU), captures more contextual information. The lightweight asymmetric detection head, PADH, enhances performance while reducing the number of parameters. The loss function is refined using PIoUv2, and further model optimization is achieved through layer-adaptive sparsity for magnitude-based pruning (LAMP). Experimental results on the WOTR public pedestrian road object dataset demonstrate that OD-YOLO, compared to YOLOv8n, reduces parameters to 3×106 and improves mAP@0.5 and mAP@0.5:0.95 by 3.4 and 4.1 percentage points, respectively. It proves that the algorithm OD-YOLO can achieve the expected effect in the road object detection of outdoor blind navigation scenarios.
    Reference | Related Articles | Metrics
    Improved YOLOv11 Algorithm for Small Target Detection in UAVs
    LIU Yuping, SHANG Cuijuan, LI Mingming
    Computer Engineering and Applications    2025, 61 (15): 124-131.   DOI: 10.3778/j.issn.1002-8331.2503-0274
    Abstract343)      PDF(pc) (1400KB)(214)       Save
    To address the issues of small target detection tasks for unmanned aerial vehicles (UAVs), such as few pixels, large scale variations, and susceptibility to background interference, an improved algorithm based on YOLOv11 is proposed. Firstly, a new ELAN-DC module is designed to improve the backbone network, combining double convolution DC in the CBS module of the efficient layer aggregation network ELAN to enhance the feature extraction capability of backbone part of the model. Secondly, a new global-to-local bidirectional feature fusion structure GLBiFPN is designed to improve the effect of multi-scale feature fusion. Finally, a dynamic detection head DyHead is introduced to further enhance the detection accuracy of the model. Experimental results show that on the VisDrone2019 dataset, the detection accuracy, mAP50 and mAP50-95, of the proposed algorithm has increased by 5.1 and 3.5 percentage points respectively, compared to YOLOv11n.
    Reference | Related Articles | Metrics
    Review of Visible and Infrared Image Fusion for Intelligent Object Detection
    ZHU Ziwen, SONG Xiao’ou, CUI Wei, QI Fengli
    Computer Engineering and Applications    2025, 61 (17): 17-32.   DOI: 10.3778/j.issn.1002-8331.2501-0206
    Abstract242)      PDF(pc) (1610KB)(213)       Save
    With the rapid development of artificial intelligence, object detection and recognition have become increasingly important. Deep learning-based object detection techniques that fuse visible and infrared images demonstrate robust feature extraction and generalization capabilities, effectively integrating features from both modalities. This paper first reviews the current state of dual-modal image fusion for object detection. It then analyzes the advantages of dual-modal fusion within deep learning-based detection and compares commonly used datasets and key technical challenges. Next, the paper summarizes object detection algorithms based on different fusion stages, emphasizing the benefits and dominance of feature-level fusion. It further analyzes fusion detection algorithms based on different base models, highlighting the advantages and dominant role of the Transformer and the potential of Mamba for future research. Finally, the paper provides a forward-looking perspective on future research oriented towards practical applications.
    Reference | Related Articles | Metrics
    Review of Research Progress in Object Detection Driven by Deep Learning
    SHAN Xianying, ZHANG Lin, LI Zehui
    Computer Engineering and Applications    2025, 61 (1): 24-41.   DOI: 10.3778/j.issn.1002-8331.2407-0038
    Abstract270)      PDF(pc) (7781KB)(213)       Save
    In recent years, deep learning, driven by high-performance GPU computing, has rapidly expanded into security, healthcare, and industry. Object detection models have evolved from traditional methods to convolutional neural networks (CNN), significantly saving resources. This review outlines the development of object detection and recent advances in deep learning by referencing extensive literature and following a two-stage framework. It compares model performance across different datasets, summarizes the strengths and weaknesses of various methods, and highlights key datasets. The review also discusses the practical applications of object detection algorithms, particularly in autonomous driving, medical imaging, and remote sensing. Finally, it explores the opportunities and challenges for future research in deep learning-driven object detection.
    Reference | Related Articles | Metrics
    LMUAV-YOLOv8: Lightweight Network for Object Detection in Low-Altitude UAV Vision
    DONG Yibing, ZENG Hui, HOU Shaojie
    Computer Engineering and Applications    2025, 61 (3): 94-110.   DOI: 10.3778/j.issn.1002-8331.2407-0127
    Abstract237)      PDF(pc) (4352KB)(213)       Save
    To tackle the challenges of weak sensing capacity and high missed detection rates for small-scale objects using low-altitude UAV in complex traffic scenarios, the LMUAV-YOLOv8 algorithm is proposed. Its efficiency and advantage are verified through ablation and comparative experiments. The internal mechanisms is visualized by using the method of class activation mapping. In this dissertation, a lightweight feature fusion network (UAV_RepGFPN) is introduced firstly, proposing new feature fusion paths and a feature fusion module DBB_GELAN, which reduces the number of parameters and computation while improving the performance of the feature fusion network. Secondly, the feature extraction module (FTA_C2f) is constructed using partial convolution (PConv) and triplet attention mechanism (Triplet Attention), and the ADown down-sampling module is introduced. By rearranging the dimensions of the input feature maps and making fine-grained adjustments, the ability of the deep network to capture spatial features is enhanced, further reducing the number of parameters and computation. Then, concerning large amount of information loss during in layer-by-layer feature extraction and spatial transformation, a new context-guided programmable gradient information (UAV_PGI) strategy is proposed. By designing a context-guided reversible architecture and three additional auxiliary detection heads, UAV_PGI significantly enhance detection capabilities for aerial objects. In order to verify the validity and generalization ability of the model, comparative experiments are carried out on the VisDrone 2019 test set, and the results show that: compared with YOLOv8s, LMUAV-YOLOv8s on the VisDrone 2019 test set improves precision, recall, mAP@0.5, and mAP@0.5:0.95 by 4.2, 3.9, 5.1, and 3.0?percentage points, separately, with the computational cost increased by only 0.4?GFLOPs and the parameter count reduced by 63.9%, meaning a good balance between performance and cost. The inference experimental results based on NVIDIA Jetson Xavier NX embedded platform show that compared with the baseline model, the proposed algorithm can obtain higher detection accuracy under the condition of meeting the requirements of real-time detection, rendering it more suitable for real-time target detection scenarios in drones. Finally, the decision making process is visualized by using the method of class activation mapping, which provides a intuitive way to understand the internal mechanisms of the networ. And the results show that the proposed model has superior small-scale feature extraction and high-resolution processing capabilities.
    Reference | Related Articles | Metrics
    Construction and Application of Multimodal Knowledge Graph in Construction Safety Field Based on Large Language Model
    DONG Lei, WU Fuju, SHI Jianyong, PAN Longfei
    Computer Engineering and Applications    2025, 61 (9): 325-333.   DOI: 10.3778/j.issn.1002-8331.2408-0036
    Abstract401)      PDF(pc) (1439KB)(211)       Save
    It is difficult for existing construction safety management methods to effectively integrate multi-modal information of text and picture, and the knowledge expression and reasoning ability in the field of construction site safety accidents are limited, and the processing and application of data require a wide range of domain knowledge and professional background. To solve this problem, this paper proposes a multimodal knowledge graph construction method based on multimodal large language model. Through three steps of data collection and preprocessing, ontology construction at concept level and instance level, the multimodal knowledge graph is constructed to solve the problems of multimodal integration of text and picture and the limited knowledge expression and reasoning ability in the field. The knowledge map constructed not only integrates the accident safety knowledge in the text, but also includes the scene picture information, which improves the comprehensiveness and practicality of the knowledge. Three indexes of accuracy, recall rate and F1 value are used to evaluate the extraction results, and high scores are obtained, which verify the rationality and accuracy of the large model for image extraction. In practical application, this method is helpful for safety managers to discover safety accidents on construction site in time, and provides important support for management decision-making and intelligent reasoning.
    Reference | Related Articles | Metrics
    Review of Question Answering Techniques for Knowledge Graph
    QIAN Shenyi, FU Bowen, LI Daiyi, LIANG Yaoyao
    Computer Engineering and Applications    2025, 61 (23): 1-23.   DOI: 10.3778/j.issn.1002-8331.2501-0066
    Abstract109)      PDF(pc) (1714KB)(207)       Save
    Intelligent question answering is a key technology to obtain demand information accurately and quickly from massive data. In recent years, intelligent question answering technology has achieved remarkable development, such as problem-based information extraction technology, semantic understanding technology and vector modeling method. However, with the rapid development of intelligent question answering technology, people are eager to have a reasonable division of intelligent question answering model to facilitate the use of users in different fields. In order to divide intelligent question answering model reasonably, it is convenient for researchers in the field of intelligent question answering to conduct in-depth research. Through the investigation of relevant literature in the field of knowledge graph Q&A, this paper summarizes the key technologies of knowledge graph Q&A, including entity linking and knowledge embedding, and introduces the related concepts and processing flow of knowledge graph Q&A in detail. In addition, according to different methods, knowledge graph-oriented question answering techniques are divided into three main categories: semantic parsing, information retrieval and large language model-based methods, and their advantages and disadvantages are introduced and the evaluation indexes of knowledge graph question answering models are summarized respectively. Finally, some suggestions and thoughts are put forward for the existing problems and the future development direction of knowledge graph question-answering technology.
    Reference | Related Articles | Metrics