Computer Engineering and Applications

Select

Improved RTMDet for SAR Ship Detection

ZHANG Yuning, JIA Yuan, CHEN Yue

Computer Engineering and Applications 2024, 60 (22): 314-322. DOI: 10.3778/j.issn.1002-8331.2307-0175

Abstract （444）

PDF（pc）（5257KB）（72）

Save

A synthetic aperture radar (SAR) ship detection algorithm with improved RTMDet (real-time models for object detection) is proposed to address the problem of low detection accuracy in small target ships and complex backgrounds in SAR images. Firstly, the basic building blocks in backbone network structure are optimized, and the global attention mechanism SimAM (simple, parameter-free attention module) is introduced, which improves the ability of the model to extract key feature information without adding additional parameters. In order to reduce the loss of small target feature information and increase its share in shallow feature map during feature fusion, a new lightweight feature fusion module SPD-RPAFPN (space to depth reverse path aggregation feature pyramid network) is constructed. Finally, the regression loss function is replaced with KFIoU (Kalman filter based intersection over union) in the prediction module to improve the detection capability of the model for small target ships. Experimental comparisons are conducted on the publicly available dataset RSDD. Compared with RTMDet, the improved model improves the inshore AP value by 14.6 percentage points and the total AP value by 2.7 percentage points to 90.7%, while the number of model parameters and computational effort are decreased by 4.5% and 10.8%. Compared with the current mainstream algorithm, the SAR ship detection accuracy is also significantly improved, which proves the effectiveness of the improved RTMDet algorithm.

Reference | Related Articles | Metrics

Select

Review of Research on Artificial Intelligence in Traditional Chinese Medicine Diagnosis and Treatment

SU Youli, HU Xuanyu, MA Shijie, ZHANG Yuning, Abudukelimu Abulizi, Halidanmu Abudukelimu

Computer Engineering and Applications 2024, 60 (16): 1-18. DOI: 10.3778/j.issn.1002-8331.2312-0400

Abstract （305）

PDF（pc）（6171KB）（297）

Save

The field of traditional Chinese medicine (TCM) diagnosis and treatment is gradually moving towards standardization, objectification, modernization, and intelligence. In this process, the integration of artificial intelligence (AI) has greatly propelled the advancement of TCM diagnosis and treatment, scientific research, and TCM inheritance. The review starts from the current research status of AI in TCM, combs through the application and development of AI in TCM in three stages from expert system and rule engines, traditional machine learning algorithm to deep learning, and then summarizes the knowledge management tools and large language models of TCM in recent years. Finally, this paper analyzes the multiple challenges of data fairness, multimodal data understanding, model robustness, personalized medicine, and interpretability that exist at this stage of AI in TCM. To address these challenges, it is necessary to continuously explore and propose possible solutions to promote the in-depth development of intelligent TCM diagnosis and treatment, thus better meeting the health needs of people.

Reference | Related Articles | Metrics

Select

Research Progress on Designing Lightweight Deep Convolutional Neural Networks

ZHOU Zhifei, LI Hua, FENG Yixiong, LU Jianguang, QIAN Songrong, LI Shaobo

Computer Engineering and Applications 2024, 60 (22): 1-17. DOI: 10.3778/j.issn.1002-8331.2404-0372

Abstract （299）

PDF（pc）（6330KB）（366）

Save

Lightweight design is a popular paradigm to address the dependence of deep convolutional neural network (DCNN) on device performance and hardware resources, and the purpose of lightweighting is to increase the computational speed and reduce the memory footprint without sacrificing the network performance. An overview of lightweight design approaches for DCNNs is presented, focusing on a review of the research progress in recent years, including two major lightweighting strategies, namely, system design and model compression, as well as an in-depth comparison of the innovativeness, strengths and limitations of these two types of approaches, and an exploration of the underlying framework that supports the lightweighting model. In addition, scenarios in which lightweight networks have been successfully applied are described, and predictions are made for the future development trend of DCNN lightweighting, aiming to provide useful insights and references for the research on lightweight deep convolutional neural networks.

Reference | Related Articles | Metrics

Select

Research Progress on Recommendation Algorithms with Knowledge Graph Visualization Analysis

LIN Suqing, LUO Dingnan, ZHANG Shuhua

Computer Engineering and Applications 2024, 60 (21): 1-17. DOI: 10.3778/j.issn.1002-8331.2312-0032

Abstract （285）

PDF（pc）（1215KB）（333）

Save

The application and proliferation of internet technology has caused an exponential growth in data, enhancing the complexity of information retrieval from massive datasets. Recommendation algorithms have attracted significant attention for alleviating information overload, with relevant research findings continually emerging. 4?773 Chinese and 4?531 English publications from 2012 to 2024 have been sourced from China National Knowledge Infrastructure (CNKI) and the Web of Science (WOS) core collection. Visualization tools CiteSpace and VOSviewer have been utilized to generate basic information and keyword co-occurrence graphs for literatures. Core technology keywords, including knowledge graph, graph neural network, and deep learning, have been extracted through graph analysis, and the corresponding representative recommendation algorithms have been selected. The core mechanisms and the underlying principles of the algorithms have been visually presented through charts, focusing on the limitations and challenges of existing research, as well as targeted solutions. Knowledge architecture diagrams have been developed for the algorithms associated with each core technology keyword, following the challenge-solution-source literature framework. The visualization of recommendation principles has been effectively implemented.

Reference | Related Articles | Metrics

Select

Research Advance of Crack Detection for Infrastructure Surfaces Based on Deep Learning

HU Xiangkun, LI Hua, FENG Yixiong, QIAN Songrong, LI Jian, LI Shaobo

Computer Engineering and Applications 2025, 61 (1): 1-23. DOI: 10.3778/j.issn.1002-8331.2407-0407

Abstract （282）

PDF（pc）（9136KB）（305）

Save

Civil infrastructure is prone to changes in physical or performance after long-term use, and causing certain damage to the function and service safety. So it is essential to monitor structure healthy of such facilities. Crack detection is an extremely important part of structure healthy monitoring. Timely detection and identification of such damage can effectively avoid severe accidents. Crack detection methods based on computer vision are simple, fast and accurate, and are widely used for surface crack detection in civil infrastructures. This paper reviews crack detection methods for infrastructure surfaces based on deep learning from three different detection directions: image classification, object detection, and semantic segmentation. And common data collection methods and commonly used public crack datasets are summarized. Finally, the difficulties and challenges of deep learning-based surface crack detection methods for infrastructures are discussed, and possible future development directions are envisioned.

Reference | Related Articles | Metrics

Select

Review of Application of Visual Foundation Model SAM in Medical Image Segmentation

SUN Xing, CAI Xiaohong, LI Ming, ZHANG Shuai, MA Jingang

Computer Engineering and Applications 2024, 60 (17): 1-16. DOI: 10.3778/j.issn.1002-8331.2401-0136

Abstract （279）

PDF（pc）（7912KB）（252）

Save

With the continuous development of foundation models technology, visual foundation model represented by the segment anything model (SAM) has made significant breakthroughs in the field of image segmentation. SAM, driven by prompts, accomplishes a series of downstream segmentation tasks, aiming to address all image segmentation issues comprehensively. Therefore, the application of SAM in medical image segmentation is of great significance, as its generalization performance can adapt to various medical images, providing healthcare professionals with a more comprehensive understanding of anatomical structures and pathological information. This paper introduces commonly used datasets for image segmentation, provides detailed explanations of SAM’s network architecture and generalization capabilities. It focuses on a thorough analysis of SAM’s application in five major categories of medical images: whole-slide imaging, magnetic resonance imaging, computed tomography, ultrasound, and multimodal images. The review summarizes the strengths and weaknesses of SAM, along with corresponding improvement methods. Combining current challenges in the field of medical image segmentation, the paper discusses and anticipates future directions for SAM’s development.

Reference | Related Articles | Metrics

Select

Algorithmic Research Overview on Graph Coloring Problems

SONG Jiahuan, WANG Xiaofeng, HU Simin, JIA Jingwei, YAN Dong

Computer Engineering and Applications 2024, 60 (18): 66-77. DOI: 10.3778/j.issn.1002-8331.2403-0434

Abstract （264）

PDF（pc）（4612KB）（203）

Save

The graph coloring problem (GCP) is a classic combinatorial optimization problem that has been widely applied in various fields such as mathematics, computer science, and biological science. Due to the NP hard nature of graph coloring problems, there is currently no precise algorithm in polynomial time to solve the problem. In order to provide an efficient algorithm for solving this problem, it is necessary to review the existing algorithms. It mainly divided into intelligent optimization algorithms, heuristic algorithms, reinforcement learning algorithms, etc., comparative analysis is carried out from the aspects of algorithm principles, improvement ideas, performance and accuracy, summarizing the advantages and disadvantages of algorithms, and pointing out the research direction and algorithm design path of GCP, which has guiding significance for the research of related problems.

Reference | Related Articles | Metrics

Select

Review of YOLO Methods for Universal Object Detection

MI Zeng, LIAN Zhe

Computer Engineering and Applications 2024, 60 (21): 38-54. DOI: 10.3778/j.issn.1002-8331.2404-0130

Abstract （261）

PDF（pc）（5798KB）（258）

Save

As the first single-stage object detection algorithm in the era of deep learning, YOLO has sparked a wave of enthusiasm in the field of computer vision with its powerful and unique paradigm, and has become a milestone achievement in object detection algorithms. It is still a typical algorithm that achieves the best balance between speed and accuracy, and is widely used in industrial fields such as autonomous driving and intelligent vision systems. In the past eight years, driven by deep learning technology, YOLO methods have developed rapidly and have profound impact on the entire field of object detection. This paper conducts an in-depth investigation of the YOLO method related work from the perspective of technological evolution, comprehensively summarizing the innovation and contributions of each iteration from the initial YOLO v1 to the latest YOLO v9 and YOLO v10. Based on the significant technological improvements at different time points, the YOLO method is divided into four parts: early basic YOLO, standard version YOLO, standard improvement YOLO, and unique improvement YOLO. The unique perspectives of the improvement methods in each period are introduced in detail. In addition, the dataset and indicators for evaluating the YOLO method are summarized, and detailed experimental results of different versions of YOLO and different models of the same version of YOLO are collected. The development and changes of YOLO are summarized from both macro and micro levels. Through analysis, the differences and inherent connections in the development framework, backbone network architecture, and prior box usage among different versions of YOLO are revealed, emphasizing the importance of balancing speed and accuracy in YOLO. Finally, through systematic review, the future development trends of YOLO method is summarized.

Reference | Related Articles | Metrics

Select

Research on Intelligent Question Answering System Based on Large Language Model

REN Haiyu, LIU Jianping, WANG Jian, GU Xunxun, CHEN Xi, ZHANG Yue, ZHAO Changxu

Computer Engineering and Applications 2025, 61 (7): 1-24. DOI: 10.3778/j.issn.1002-8331.2409-0300

Abstract （261）

PDF（pc）（1720KB）（291）

Save

Intelligent question answering is a core subfield in natural language processing, aiming at systems that understand and answer natural language questions posed by users. Traditional question answering systems usually rely on predefined rules and limited corpora and are unable to handle complex multi-round dialogues. Large language models are natural language processing models based on deep learning technology, with billions or even hundreds of billions of parameters. They can not only understand and generate natural language but also significantly improve the accuracy and efficiency of question answering systems, promoting the development of intelligent question answering technology. In recent years, intelligent question answering based on large model technology has gradually become a research hotspot, but a systematic review in this field is still relatively lacking. Therefore, this article conducts a systematic review of intelligent question answering systems based on large models. Firstly, it introduces the basic concepts of question answering systems, datasets, and their evaluation metrics. Secondly, it presents question answering systems based on large models, including those based on prompt learning, knowledge graphs, retrieval-augmented generation, and intelligent agents, as well as the technical route of fine-tuning in question answering tasks, and compares the advantages, disadvantages, and application scenarios of the five methods in question answering systems. Finally, it summarizes the current research challenges and future development trends of question answering systems based on large language models.

Reference | Related Articles | Metrics

Select

Research Review on Deep Reinforcement Learning for Solving End-to-End Navigation Problems of Mobile Robots

HE Li, YAO Jiacheng, LIAO Yuxin, ZHANG Wenzhi, LU Zhaoqing, YUAN Liang, XIAO Wendong

Computer Engineering and Applications 2024, 60 (14): 1-13. DOI: 10.3778/j.issn.1002-8331.2312-0256

Abstract （240）

PDF（pc）（4646KB）（384）

Save

Autonomous navigation is the prerequisite and foundation for mobile robots to accomplish complex tasks. Traditional autonomous navigation systems rely on the accuracy of maps and cannot adapt to highly complex industrial and service scenarios. End-to-end navigation methods for mobile robots that do not rely on a priori map information and are able to make autonomous decisions through deep reinforcement learning, and environment interaction learning have become a new research hotspot. Most existing classifications cannot comprehensively summarize the challenges and opportunities of end-to-end navigation problems. Based on the characteristics of end-to-end navigation systems, the challenges of the navigation problem are attributed to the key issues of poor perception ability of navigation agents, ineffective learning and poor generalization ability of navigation strategies. The research status and development trends of end-to-end navigation systems are described. Representative research results in recent years addressing these key issues are detailed respectively, and their advantages and shortcomings are summarized. Finally, the future development trends of end-to-end navigation for mobile robots are prospectively envisioned in aspects such as visual language navigation, multi-agents collaborative navigation, end-to-end navigation for fusion super-resolution reconstructed images and interpretable end-to-end navigation, providing certain insights for the research and application of end-to-end navigation for mobile robots.

Reference | Related Articles | Metrics

Select

Comprehensive Review of Large Language Model Fine-Tuning

ZHANG Qintong, WANG Yuchao, WANG Hexi, WANG Junxin, CHEN Hai

Computer Engineering and Applications 2024, 60 (17): 17-33. DOI: 10.3778/j.issn.1002-8331.2312-0035

Abstract （235）

PDF（pc）（6335KB）（257）

Save

The rise of large-scale language models signifies a new milestone in the field of deep learning, with fine-tuning techniques playing a crucial role in optimizing model performance. This paper provides a comprehensive overview of fine-tuning techniques for large-scale language models. It reviews the development stages of language models, including statistical language models, neural network language models, pre-trained language models, and large language models. The basic concepts of fine-tuning are explored, covering classic fine-tuning, efficient parameter fine-tuning, prompt tuning, and reinforcement learning fine-tuning. The paper delves into the principles and development of each fine-tuning technique, offering a comparative analysis across these four major categories. In conclusion, the paper summarizes the current state of research on fine-tuning techniques and underscores the potential research value in this domain, providing insights into future directions of development.

Reference | Related Articles | Metrics

Select

Improved Road Defect Detection Algorithm Based on YOLOv8

WANG Xueqiu, GAO Huanbing, JIA Zemeng

Computer Engineering and Applications 2024, 60 (17): 179-190. DOI: 10.3778/j.issn.1002-8331.2404-0288

Abstract （234）

PDF（pc）（5995KB）（215）

Save

Various defects can emerge on the road surface after prolonged use. Failing to promptly detect and repair these defects can significantly reduce the road’s lifespan and jeopardize driving safety. Consequently, real-time detection of road defects assumes paramount importance. However, traditional detection methods suffer from sluggish speed and hefty cost requirements. Hence, to tackle these challenges, a novel road detection algorithm called DML-YOLO is proposed, which builds upon the YOLOv8 framework. This algorithm integrates the MultiPath coordinate attention (MPCA) mechanism into the backbone network to enhance feature extraction. Additionally, the C2f-MPDC module is introduced to dynamically adjust the receptive field and improve detection capabilities. Furthermore, the network’s neck structure is redesigned, introducing a novel diversity feature pyramid network (DFPN) that reduces model size and fuses low-level feature maps to extract rich, detailed information and elevate the success rate of detecting small targets. Moreover, a lightweight shared convolutional detection head (LSCD head) is meticulously designed to enhance detection efficiency while reducing model size. Ultimately, extensive experimental results demonstrate that DML-YOLO achieves remarkable average detection precision, with mAP@0.5 scores of 89.6% on the RDD2022 dataset and 73.6% on the VOC2007 dataset, surpassing other models tested. Additionally, compared to the YOLOv8 model, DML-YOLO boasts a reduction of 32.37% in parameter count and 14.49% in computational workload, making it highly suitable for deployment in resource-constrained computing environments like embedded systems and mobile devices.

Reference | Related Articles | Metrics

Select

Review of Text Classification Methods Based on Graph Neural Networks

SU Yilei, LI Weijun, LIU Xueyang, DING Jianping, LIU Shixia, LI Haonan, LI Guanfeng

Computer Engineering and Applications 2024, 60 (19): 1-17. DOI: 10.3778/j.issn.1002-8331.2403-0142

Abstract （232）

PDF（pc）（3425KB）（256）

Save

Text classification is an important task in the field of natural language processing, aiming to assign given text data to a predefined set of categories. Traditional text classification methods can only handle data in Euclidean space and cannot process non-Euclidean data such as graphs. For text data with graph structure, it is not directly processable and cannot capture the non-Euclidean structure in the graph. Therefore, how to apply graph neural networks to text classification tasks is one of the current research hotspots. This paper reviews the text classification methods based on graph neural networks. Firstly, it outlines the traditional text classification methods based on machine learning and deep learning, and summarizes the background and principles of graph convolutional neural networks. Secondly, it elaborates on the text classification methods based on graph neural networks according to different types of graph networks, and conducts an in-depth analysis of the application of graph neural network models in text classification. Then, it compares the current text classification models based on graph neural networks through comparative experiments and discusses the classification performance of the models. Finally, it proposes future research directions to further promote the development of this field.

Reference | Related Articles | Metrics

Select

Research Progress on Multi-Agent Deep Reinforcement Learning and Scalability

LIU Yanfei, LI Chao, WANG Zhong, WANG Jieling

Computer Engineering and Applications 2025, 61 (4): 1-24. DOI: 10.3778/j.issn.1002-8331.2407-0034

Abstract （222）

PDF（pc）（2161KB）（267）

Save

Multi-agent deep reinforcement learning has shown great potential in solving agent collaboration, competition, and communication problems in recent years. However, as its application expands across more domains, scalability has become a focal concern, which is an important problem from theoretical research to large-scale engineering applications. This paper reviews the reinforcement learning theory and typical algorithms of deep reinforcement learning, introduces three learning paradigms of multi-agent deep reinforcement learning and their representative algorithms, and briefly summarizes the current mainstream open-source experimental platforms. Then, this paper delves into the research progress on the scalability of the number and scenarios in multi-agent deep reinforcement learning, analyzes the main problems faced by each method and providing existing solutions. Finally, the application prospect and development trend of multi-agent deep reinforcement learning are prospected, providing references and inspiration to further advance research in this field.

Reference | Related Articles | Metrics

Select

Knowledge Graph Embedding Model Incorporating Multi-Level Convolutional Neural Networks

LI Min, LI Xuejun, LIAO Jing

Computer Engineering and Applications 2025, 61 (6): 192-198. DOI: 10.3778/j.issn.1002-8331.2310-0360

Abstract （219）

PDF（pc）（748KB）（56）

Save

Knowledge graph embedding projects entities and relations into a continuous low-dimensional embedding space to learn the triple features. The model based on translation cannot extract deep knowledge and has limited feature expression ability. Although the model based on neural network can extract deep knowledge, it is easy to lose shallow knowledge, and has weak feature interaction ability between entities and relations. In order to fully extract the shallow and deep features of triple in the model based on neural network, this paper introduces a knowledge graph embedding model incorporating multi-level convolutional neural networks called ConvM. ConvM model uses the recombination embedding method of cross-arrangement of head entities and relations to enhance the feature interaction between them. It also adopts the feature extraction module that combines dilated convolution with one-dimensional and three-dimensional convolution kernels to capture multiscale interaction features between entities and relations. In addition, ConvM model introduces a residual connection to improve the forgetting problem of original information. Five public datasets serve as the basis for conducting link prediction experiments. Experimental results demonstrate that ConvM model outperforms ConvE model, with MRR metric improved by 23.3%, 10.8%, and 12.2% on FB15k, FB15k-237, and Kinship datasets, respectively. These findings highlight the outstanding feature expression capability of ConvM model and its effective enhancement of link prediction performance.

Reference | Related Articles | Metrics

Select

Improved YOLOv11n Small Object Detection Algorithm in UAV View

LI Bin, LI Shenglin

Computer Engineering and Applications 2025, 61 (7): 96-104. DOI: 10.3778/j.issn.1002-8331.2411-0072

Abstract （216）

PDF（pc）（1241KB）（218）

Save

In order to effectively deal with the challenges of complex background, dense target, target miniaturization and mobile terminal deployment faced by small target detection in UAV aerial photography, the YOLOv11n model is improved. Firstly, RFCBAMConv module is used to improve C3k2, which enhances the ability of feature extraction. Then, the dilated feature pyramid convolution (DFPC) module is designed to replace the original SPPF layer. Through multi-scale dilated convolution, the extraction of small target detail features of UAV is strengthened. Secondly, a new feature pyramid structure is proposed, and a feature map output of 160×160 size is added to the P2 layer to extract the feature information of small targets. This method replaces the traditional practice of adding P2 small target detection head. The CSPOK module and ContextGuidedBlock_Down (CGBD) convolution are introduced, which significantly improves the extraction ability of global features and the fusion ability of multi-scale features. Finally, the dynamic detection head (DyHead) is used to replace the original detection head, which improves the target detection accuracy of the model. The experimental results show that the mAP@0.5 and mAP@0.5：0.95 indicators of the improved model on the VisDrone dataset are increased by 0.071 and 0.049, respectively. In addition, the generalization experiments on AI-TOD and SODA-A datasets also show that the improved model achieves 0.055 and 0.048 improvement in mAP@0.5, respectively, which fully verifies the effectiveness and universality of the model.

Reference | Related Articles | Metrics

Select

Review of Medical Image Segmentation Algorithms Based on U-Net Variants

CUI Ke, TIAN Qichuan, LIAN Lu

Computer Engineering and Applications 2024, 60 (11): 32-49. DOI: 10.3778/j.issn.1002-8331.2310-0335

Abstract （211）

PDF（pc）（6802KB）（284）

Save

The simple and efficient network structure of U-Net is widely used in medical image segmentation, and many scholars have made various researches on the U-Net structure. This paper elucidates in the following: firstly, the paper summarizes the key challenges of the U-Net network in the field of medical image segmentation; next, it elaborates the formats and characteristics of medical image datasets that are commonly used in the U-Net network; then, it summarizes the six improvement mechanism of U-Net：skip connection mechanism, generative adversarial network, residual connection mechanism, 3D-UNet, Transformer mechanism, and dense connecting mechanism. Finally, the paper discusses the relationship between these improvement mechanisms and commonly used medical data formats, and points out the ideas and directions for future improvement, so as to stimulate the unlimited potential of U-Net in medical image segmentation.

Reference | Related Articles | Metrics

Select

Survey of Deep Learning Based Approaches for Gaze Estimation

WEN Mingqi, REN Luqian, CHEN Zhenqin, YANG Zhuo, ZHAN Yinwei

Computer Engineering and Applications 2024, 60 (12): 18-33. DOI: 10.3778/j.issn.1002-8331.2309-0497

Abstract （209）

PDF（pc）（6991KB）（235）

Save

Gaze estimation is a technique for predicting the gaze position or gaze direction of the human eye and plays an important role in human-computer interaction and computer vision applications. The recent development of deep learning has revolutionized many computer vision tasks, and using deep learning for appearance-based gaze estimation has also become a hot topic. Focusing on the training process of the deep learning model, this paper analyzes state-of-the-art gaze estimation methods from four perspectives: gaze data preprocessing, gaze feature extraction, gaze learning strategies, and deep gaze model structures. In addition, the mainstream public datasets are summarized, and the performance evaluation and analysis of 2D and 3D gaze estimation methods are carried out on several popular datasets. Finally, the challenges faced by the existing gaze estimation methods are discussed, and the future development directions are prospected.

Reference | Related Articles | Metrics

Select

Research and Progress on Super-Resolution Reconstruction Methods for Terahertz Images

JIANG Yuying, JIANG Mengdie, GE Hongyi, ZHANG Yuan, LI Guangming, CHEN Xinyu, WEN Xixi, CHEN Hao

Computer Engineering and Applications 2024, 60 (18): 1-16. DOI: 10.3778/j.issn.1002-8331.2401-0161

Abstract （202）

PDF（pc）（6043KB）（236）

Save

Image super resolution is an important research topic in image processing field in recent decades, aiming to reconstruct high resolution image from low resolution image. It breaks through the limitation of manufacturing process and cost of sensor and optical device, and improves image resolution from the aspect of algorithm, which is a simple, efficient and low-cost method. As an emerging technology, Terahertz (THz) technology has been widely used in many fields. Due to the influence of THz diffraction and scattering, THz images will produce image blur and unclear texture details. More and more scholars are committed to developing super-resolution reconstruction methods for THz images. Based on the research of the literature related to THz technology and super-resolution reconstruction technology in recent years, this paper elaborates the three major reconstruction methods of THz images, focuses on the introduction of deep learning-based methods, and compares the reconstruction effects, advantages and disadvantages of various algorithms. The THz image quality assessment indexes and the commonly used datasets are reviewed, and the super-resolution reconstruction technology of THz image related applications are summarized. Finally, the future development trend of THz image super-resolution reconstruction technology is discussed.

Reference | Related Articles | Metrics

Select

Overview of Causal Learning Techniques and Applications

LONG Xiangfu, LI Shaobo, ZHANG Yizong, YANG Lei, LI Chuanjiang

Computer Engineering and Applications 2024, 60 (24): 1-19. DOI: 10.3778/j.issn.1002-8331.2405-0407

Abstract （202）

PDF（pc）（6887KB）（240）

Save

Machine learning is the core of artificial intelligence and data science, and is widely used in education, transportation and manufacturing. With the development of machine learning and the extension of application fields, the models have revealed some problems to be solved in terms of interpretability and fairness. Causal learning (CL), as a method combining causality and machine learning techniques, can enhance the interpretability of the model and solve the problems of fairness, and its research has gradually become a hot spot in the academic world. Therefore, based on the introduction of the relevant theoretical knowledge of CL, the techniques of causal explanation, causal supervised learning, causal fairness, and causal reinforcement learning are firstly analyzed and outlined in an all-round way according to the problems that can be solved by CL. Secondly, the applications of CL in the fields of medicine, agriculture and intelligent manufacturing are summarized from multiple perspectives. Finally, some open problems and challenges of CL are summarized, and future research directions are given, aiming to promote the continuous development of CL.

Reference | Related Articles | Metrics

Select

Improved Road Object Detection Algorithm for YOLOv8n

GAO Deyong, CHEN Taida, MIAO Lan

Computer Engineering and Applications 2024, 60 (16): 186-197. DOI: 10.3778/j.issn.1002-8331.2403-0383

Abstract （200）

PDF（pc）（9556KB）（169）

Save

Addressing the challenges posed by varying object scales and complex background interference that result in low detection accuracy and high missed detection rates in road scenes, an enhanced road object detection algorithm is proposed based on YOLOv8n. Firstly, the diverse branch block (DBB) is introduced to construct the C2fDBB module, replacing the original C2f module, thereby enhancing the network capacity to extract multi-scale features. Secondly, building upon the path aggregation network (PANet), the asymptotic feature pyramid network (AFPN) concept is leveraged to propose the path aggregation progressive feature pyramid network (PA-AFPN) feature fusion method, enhancing the network ability to integrate multi-scale features effectively. Additionally, the SPPF (spatial pyramid pooling fast) with dual-branch structure incorporating triplet attention (SPPF2_TA) module is designed, which efficiently integrates multi-scale information through an average pooling branch and triplet attention (TA) mechanism, effectively reducing the impact of background interference on detection. Finally, MPDIoU is adopted as the new boundary regression loss function to replace the original loss function, expediting algorithm convergence and enhancing object localization precision. Experimental results on the public road benchmark datasets BDD100K and SODA10M demonstrate that the improved algorithm achieves an increase of 5.7?percentage points and 7.3?percentage points in mAP@0.5 compared to baseline algorithms, with a reduction in computational load by 0.6 GFLOPs. Compared to other mainstream object detection methods, the proposed algorithm shows notable advantages in terms of FLOPs, FPS, and mAP@0.5, making it more suitable for object detection tasks in road scenes.

Reference | Related Articles | Metrics

Select

Research on Unmanned Aerial Vehicle Swarm Resilience Assessment and Reconfiguration Technology

WEI Chenyue, HE Ming, HAN Wei, XU Xin, GAO Hong

Computer Engineering and Applications 2024, 60 (15): 1-10. DOI: 10.3778/j.issn.1002-8331.2401-0452

Abstract （200）

PDF（pc）（4418KB）（237）

Save

Unmanned aircraft vehicle (UAV) swarm is often affected by perturbing factors such as terrain, wind, snow, rain and fog, and anti-aircraft strikes in practical applications, which leads to the decline of swarm performance and mission accomplishment capability. In order to effectively assess and improve the swarm anti-disturbance capability, an in-depth study is carried out in terms of UAV swarm resilience assessment indexes and resilience reconfiguration methods. Firstly, the current research status of UAV swarm resilience assessment indicators is sorted out and analyzed. Secondly, the research on UAV swarm resilience reconstruction methods is summarized in terms of predictive reconstruction and anti-disturbance reconstruction. To address the problems of incomplete assessment indexes and the inability of swarm adaptive reconfiguration under multi-task and multi-disturbance situations, multi-dimensional resilience assessment indexes and UAV swarm phase change reconfiguration methods are proposed respectively, which further take into account the impact of coverage, energy consumption and other factors on swarm performance, realize the adaptive phase change of different types of tasks and disturbance types, and significantly improve the swarm’s ability to cope with disturbances. Finally, it concludes and looks forward to the future development trend of UAV swarm elastic reconfiguration.

Reference | Related Articles | Metrics

Select

Review of Lung CT Image Lesion Region Segmentation Based on Deep Learning

LI Xiaotong, MA Sufen, SHENG Hui, WEI Guohui, LI Xintong

Computer Engineering and Applications 2025, 61 (4): 25-42. DOI: 10.3778/j.issn.1002-8331.2403-0315

Abstract （198）

PDF（pc）（4394KB）（218）

Save

Lung cancer poses a serious threat to people’s lives and health. The morphology of lesion areas in lung CT images is complex and diverse, and achieving high-precision segmentation of lesion areas in lung CT images has become a highly challenging key issue in the field of computer-aided diagnosis. The segmentation of lung lesion regions based on deep learning not only helps doctors diagnose early lung cancer quickly and accurately, but also has important clinical value for the treatment of lung cancer. In order to conduct in-depth research on lung lesion segmentation techniques, common datasets and evaluation indicators are introduced. The deep learning lung lesion regions segmentation models are reviewed in three aspects：segmentation model based on convolutional neural network, segmentation model based on U-Net model, and segmentation model based on generative adversarial network. The innovative points of domestic and foreign research over the past 5 years are summarized through specific experiments. The segmentation performance of various models is compared and analyzed. The advantages and disadvantages of various models are summarized, and the development direction in this field is discussed.

Reference | Related Articles | Metrics

Select

Improved YOLOv8 Urban Vehicle Target Detection Algorithm

XU Degang, WANG Shuangchen, WANG Zaiqing, YIN Kedong

Computer Engineering and Applications 2024, 60 (18): 136-146. DOI: 10.3778/j.issn.1002-8331.2401-0277

Abstract （191）

PDF（pc）（6421KB）（181）

Save

Aiming to address the challenges of missing detection, low precision, and weak generalization ability in urban vehicle target detection algorithms for complex traffic scenes, an enhanced YOLOv8 algorithm is proposed. Firstly, this paper replaces the C2f module in the backbone network with an improved GAM-C2f structure to strike a balance between computational efficiency and model accuracy. Secondly, a SPPFAPGC module is designed to prevent local feature loss caused by maximum pooling operations in the SPPF structure. This enhances the richness of the feature map and combines it with a small target detection head to strengthen distant small target vehicle detection capability while integrating local and global features effectively. Finally, to suppress harmful gradients generated by low-quality images, this paper utilizes WIOU loss function instead of CIoU for improved bounding box regression performance, faster convergence speed, and higher regression accuracy. Experimental results on street vehicle datasets demonstrate that compared to the benchmark model YOLOv8n, the improved algorithm achieves a 1.6 percentage points increase in mAP50 and a 2.0 percentage points increase in Recall respectively , the problem of poor detection performance for small-target vehicles in urban traffic scenes is effectively improved. Verification on VisDrone2019 dataset also shows improvements of 1.1 percentage points in mAP50 and 1.6 percentage points in Recall further confirming the superiority of the enhanced algorithm over others mainstream algorithms regarding accuracy and recall rate specifically tailored for urban vehicle detection tasks.

Reference | Related Articles | Metrics

Select

Review of Object Detection Based on Event Cameras

ZHANG Yali, TIAN Qichuan, TANG Chaolin

Computer Engineering and Applications 2024, 60 (13): 23-35. DOI: 10.3778/j.issn.1002-8331.2312-0322

Abstract （190）

PDF（pc）（5613KB）（244）

Save

Event cameras are imaging methods that mimic biological retinas, with high dynamics, low latency, high temporal resolution and low power consumption. It breaks through the dilemma that traditional cameras are difficult to capture objects and target recognition under high dynamic range, and the characteristics of event cameras are of experimental significance for studying the object detection problem based on event cameras. This paper first briefly describes the status, development process, advantages and challenges of event cameras, then introduces the working principle of various types of event cameras and some object detection algorithms based on event cameras, and finally explains the challenges and future trends of object detection algorithms based on event cameras, and summarizes the article.

Reference | Related Articles | Metrics

Select

Lightweight Road Damage Detection Method Based on Improved YOLOv8

XU Tiefeng, HUANG He, ZHANG Hongmin, NIU Xiaofu

Computer Engineering and Applications 2024, 60 (14): 175-186. DOI: 10.3778/j.issn.1002-8331.2402-0243

Abstract （187）

PDF（pc）（7736KB）（160）

Save

Aiming at the problems of large memory space occupation, high computational complexity, and difficult to meet the real-time target detection requirements of the road damage detection model in complex scenes, a lightweight road damage detection model DGE-YOLO-P is proposed for the complex natural scenes. Firstly, the C2f fusion deformable convolutional design C2f_DCNv3 module in the network is enhanced to enhance the modelling capability of object deformation and the input feature information is dimensionality reduced to effectively reduce the number of parameters and the computational complexity. The input feature information is dimensionality reduced to effectively reduce the number of model parameters and computational complexity. Then, the GS-Decoupled head detection module is designed to reduce the parameters of the detection head while realising the effective aggregation of global information. At the same time, the E-Slide Loss weight function is designed to assign higher weights to the difficult samples, fully learn the difficult sample data in road damage, and further improve the model detection accuracy. Finally, channel pruning is used to reduce the redundant channels of the model, which effectively compresses the model volume and improves the detection speed. The experimental results show that the mAP of the DGE-YOLO-P model is increased by 2.4?percentage points compared with the YOLOv8n model, while the number of model parameters, computational volume and model size are reduced by 58.1%, 66.7% and 55.5%, respectively. The detection speed FPS is increased from 34 frame/s to 51 frame/s.

Reference | Related Articles | Metrics

Select

Lightweight YOLOv8 Detection Algorithm for Small Object Detection in UAV Aerial Photography

LI Yanchao, SHI Weiya, FENG Can

Computer Engineering and Applications 2024, 60 (17): 167-178. DOI: 10.3778/j.issn.1002-8331.2402-0230

Abstract （186）

PDF（pc）（7882KB）（167）

Save

To address the problems of difficult feature extraction and small targets being overwhelmed by noise in complex scenes for target detection in unmanned aerial vehicle (UAV) images, this paper proposes an UAV target detection algorithm called SC-YOLO based on YOLOv8s. Firstly, to learn positional details of regions of interest, a self-position module (SPM) attention based on coordinate attention (CA) is presented. Secondly, to mitigate the impact caused by channel compression of the Carafe upsampling operator, a Carafe enhancer module (CEM) is proposed. Finally, by analyzing the relationship between the gradient gain function and the size of targets in the dataset, this paper enables WIoU_v3 to focus more on the general quality anchor boxes for medium and small targets. This is validated on the VisDrone2019 dataset, where it is found that WIoU_v3 can better target the parameter setting range for general quality anchor boxes of medium and small targets. The improved YOLOv8s algorithm achieves a mean average precision (mAP) of 43.1% on the VisDrone2019 validation set and an mAP of 34.8% on the test set, demonstrating superior detection performance among algorithms of similar scale in recent years. The improved algorithm only adds 1.1×106 in terms of the number of parameters and increases the floating point operations (FLOPs) by 1.5 GFLOPs, yet it achieves a 2.0 and 2.1 percentage points increase in detection accuracy on the validation and test sets, respectively. On the Tinyperson dataset, the detection accuracy is increased by 1.4 percentage points.

Reference | Related Articles | Metrics

Select

Review of Connected Autonomous Vehicle Cooperative Control at On-Ramp Merging Areas

LI Chun, WU Zhizhou, ZENG Guang, ZHAO Xin, YANG Zhidan

Computer Engineering and Applications 2024, 60 (12): 1-17. DOI: 10.3778/j.issn.1002-8331.2310-0310

Abstract （185）

PDF（pc）（5963KB）（262）

Save

The area where vehicles conduct interchanges is designated as the on-ramp merging area. The traffic efficiency in the ramp merging area drastically decreases if the mainline and ramp traffic flow density reaches saturation. As a current research hotspot in transportation, intelligent network technology, relying on the high-precision motion control and high-efficiency communication of connected-automated vehicle (CAV), can significantly improve the traffic efficiency in the merging area. The fusion strategies used by CAV are assessed in this research utilizing three different control paradigms: feedback control, optimal control, and reinforcement learning. The shortcomings of the three methods in this scenario are summarized, and specific improvement measures are given by reviewing existing research. Also, it offers a thorough summary of the most recent developments and trends in this particular scientific field.

Reference | Related Articles | Metrics

Select

Review of Application of BEV Perceptual Learning in Autonomous Driving

HUANG Deqi, HUANG Haifeng, HUANG Deyi, LIU Zhenhang

Computer Engineering and Applications 2025, 61 (6): 1-21. DOI: 10.3778/j.issn.1002-8331.2407-0501

Abstract （185）

PDF（pc）（2079KB）（213）

Save

As the types of sensors used as acquisition inputs in the autonomous driving perception module continue to develop, it becomes more and more difficult to represent the multi-modal data uniformly. BEV perception learning in the automatic driving perception task module can make multi-modal data unified integration into a feature space, which has better development potential compared with other perception learning models. The reasons for the good development potential of BEV perception model are summarized from five aspects: research significance, spatial deployment, preparation work, algorithm development, and evaluation index. The BEV perception model can be summarized into four series from a framework perspective: Lift-Splat-Lss series, IPM reverse perspective conversion, MLP view conversion and Transformer view conversion. The input data can be summarized into two categories: the first type of pure image feature input includes monocular camera input and multi-camera input; the second type of fusion data input is not only the simple data fusion of point cloud data and image features, but also the knowledge distillation fusion guided or supervised by point cloud data and the fusion of height segmentation by guided slice. It provides an overview of the application of four kinds of automatic driving tasks in BEV perception model, such as multi-target tracking, map segmentation, lane detection and 3D target detection, and summarizes the shortcomings of the four series of current BEV perception learning frameworks.

Reference | Related Articles | Metrics

Select

Improved Target Detection Algorithm for UAV Images with RT-DETR

JIANG Maoxiang, SI Zhanjun, WANG Xiaozhe

Computer Engineering and Applications 2025, 61 (1): 98-108. DOI: 10.3778/j.issn.1002-8331.2405-0331

Abstract （184）

PDF（pc）（5878KB）（190）

Save

This paper proposes an improved RT-DETR algorithm for unmanned aerial vehicle (UAV) target detection in light and small-sized UAV image targets. Addressing issues such as low detection accuracy due to the flexible and diverse nature of targets and complex and variable environments, the proposed method enhances the feature extraction capability of the detection model by integrating lightweight SimAM attention and inverted residual modules into the ResNet-r18 backbone network. Furthermore, a cascaded group attention mechanism is employed to optimize the inverted residual modules and feature interaction modules, improving feature selection capability and achieving refined acquisition of target detection information. Additionally, a 160×160 detection layer is introduced in the neck network to enhance the perception capability of small targets during the feature fusion stage. Finally, the experimental results based on the VisDrone2019 dataset show that the improved model has lower number of parameters and higher detection accuracy. Further experiments on the Alver_Lab_Ulastirma and HIT-UAV datasets validate the effectiveness and robustness of the proposed improvements.

Reference | Related Articles | Metrics

Select

Survey on Automated Recognition and Extraction of TTPs

YU Fengrui

Computer Engineering and Applications 2024, 60 (13): 1-22. DOI: 10.3778/j.issn.1002-8331.2309-0489

Abstract （183）

PDF（pc）（7424KB）（228）

Save

In the ever-evolving landscape of cyber threats, tactics, techniques and procedures (TTPs) play a crucial role in understanding malicious activities, providing a fine-grained perspective on the status of cybersecurity, and comprehensively illustrating cyber attack behaviors. Despite significant research efforts in the field of automated identification and extraction of TTPs, a comprehensive systematic review is currently lacking. This paper presents an in-depth analysis of the progress in this area by employing three principal approaches：traditional natural language processing, machine learning, and large language models. The study categorizes the tasks into information extraction, text classification, and text generation, and presents a summary of the general framework for identification and extraction processes. It offers a clear scope of unstructured text and TTPs, while refining the processing and analysis procedures, as well as innovative directions for each approaches. Moreover, building upon existing research, the paper identifies current challenges and proposes future research directions and development opportunities. This comprehensive survey serves as a valuable literature review to support readers in applying advanced technologies and methods for advancing research in this field.

Reference | Related Articles | Metrics

Select

LOL-YOLO：Low-Light Object Detection Incorporating Multiple Attention Mechanisms

JIANG Changjiang, HE Xuying, XIANG Jie

Computer Engineering and Applications 2024, 60 (24): 177-187. DOI: 10.3778/j.issn.1002-8331.2406-0424

Abstract （180）

PDF（pc）（7039KB）（202）

Save

Addressing the challenges in low-illumination target detection, such as blurry night scenes, indistinct boundaries, and pronounced brightness disparities, this paper introduces LOL-YOLO (low-light YOLO), a detection method based on dynamic feature fusion. A self-correcting illumination module is incorporated to enhance low-light image quality and counteract target obscurity under low illumination. A dynamic feature extraction module is proposed, which leverages an attention mechanism combining large convolutional kernels with deformable convolutions, enabling extensive and agile contextual information capture. Finally, a dynamic detection head is devised to augment perception of varying scales, spatial positions, and tasks, thereby refining detection accuracy and robustness. Experimental validation using the ExDark, DarkFace, and NPD (nighttime pedestrian detection) datasets demonstrate significant accuracy improvements over prevalent algorithms, confirming the effectiveness of the proposed method.

Reference | Related Articles | Metrics

Select

Survey on Lane Line Detection Techniques for Classifying Semantic Information Processing Modalities

HONG Shuying, ZHANG Donglin

Computer Engineering and Applications 2025, 61 (5): 1-17. DOI: 10.3778/j.issn.1002-8331.2406-0160

Abstract （176）

PDF（pc）（2981KB）（206）

Save

With the rapid development of autonomous driving technology, lane line detection, as its key component, has attracted widespread attention and shown great potential for application in intelligent transportation systems. However, traditional lane line detection techniques usually struggle to provide satisfactory recognition accuracy when dealing with complex environmental challenges. This paper reviews the development of lane detection technology and systematically sorts out 84 advanced algorithms, and innovatively divides them into four categories based on semantic processing: semantic segmentation assistance, semantic information fusion, semantic information enhancement, and semantic relationship mode-
ling. By deeply analyzing the technical characteristics and advantages of these algorithms, the main limitations of current lane line detection technology are revealed. Finally, the future development direction of lane line detection technology is put forward, especially in the utilization of semantic information, and the potential research direction is pointed out.

Reference | Related Articles | Metrics

Select

Review of Development of Visual-Inertial Joint Calibration

ZHAO Junyang, LYU Shenhua, LI Yongxu, ZHU Huixin, ZHANG Kefan

Computer Engineering and Applications 2025, 61 (8): 1-16. DOI: 10.3778/j.issn.1002-8331.2409-0330

Abstract （173）

PDF（pc）（1197KB）（216）

Save

The joint use of cameras and IMU (inertial measurement unit) can fully leverage the complementary advantages of two sensors, enabling data fusion and mutual calibration. In recent years, a variety of intelligent joint calibration methods have emerged, however, there is a lack of unified summarization and analysis. Therefore, the visual-inertial joint calibration methods are classified and sorted in a unified way to analyze the application characteristics and limitations of various approaches, and provide a better choice foundation for the application or research of camera and IMU joint calibration methods. Firstly, this paper introduces the calibration parameters and principles for both the camera and IMU, discussing these from temporal and spatial perspectives. Secondly, it classifies and comparatively analyzes online and offline temporal calibration methods. From a spatial perspective, the paper categorizes calibration methods based on the distinct principles of IMU and camera calibration into four types: optimization-based calibration, decoupled model-based calibration, filtering-based calibration, and machine learning-based calibration, while evaluating the advantages and characteristics of each approach. Finally, to summarize the entire paper, it proposes the future development trends of joint calibration: spatiotemporal unified calibration, a greater variety of calibration toolkits, the expansion of machine learning applications, and multi-sensor joint calibration, among others.

Reference | Related Articles | Metrics

Select

Review of Research Progress in Object Detection Driven by Deep Learning

SHAN Xianying, ZHANG Lin, LI Zehui

Computer Engineering and Applications 2025, 61 (1): 24-41. DOI: 10.3778/j.issn.1002-8331.2407-0038

Abstract （173）

PDF（pc）（7781KB）（167）

Save

In recent years, deep learning, driven by high-performance GPU computing, has rapidly expanded into security, healthcare, and industry. Object detection models have evolved from traditional methods to convolutional neural networks (CNN), significantly saving resources. This review outlines the development of object detection and recent advances in deep learning by referencing extensive literature and following a two-stage framework. It compares model performance across different datasets, summarizes the strengths and weaknesses of various methods, and highlights key datasets. The review also discusses the practical applications of object detection algorithms, particularly in autonomous driving, medical imaging, and remote sensing. Finally, it explores the opportunities and challenges for future research in deep learning-driven object detection.

Reference | Related Articles | Metrics

Select

Research and Comprehensive Review on Multi-Modal Knowledge Graph Fusion Techniques

CHEN Youren, LI Yong, WEN Ming, SUN Chi

Computer Engineering and Applications 2024, 60 (13): 36-50. DOI: 10.3778/j.issn.1002-8331.2309-0481

Abstract （166）

PDF（pc）（6082KB）（145）

Save

Multi-modal knowledge graphs (MMKG) integrate various modal information such as vision and text, presenting knowledge structures graphically. With the advancement of artificial intelligence, MMKG have played a significant role in recommendation systems, intelligent Q&A, and knowledge search among other fields. Compared to traditional knowledge graphs, MMKG can understand and present knowledge in multiple dimensions, possessing superior representation and application capabilities. To delve deep into the study of MMKG, this review first conducts a detailed analysis and elucidation of the value and categories of MMKG. Based on different construction methods, it compares and summarizes multi-modal knowledge extraction, representation learning, entity alignment, and other aspects, categorizes multi-modal knowledge integration methods. It analyzes the progress in the applications of MMKG, discusses the limitations of MMKG, and proposes future research directions in the field of MMKG.

Reference | Related Articles | Metrics

Select

LF-YOLO for Strip Surface Defect Detection in Industrial Scenes

MA Xiaoyao, LI Rui, LI Zili, ZHAI Wenzheng

Computer Engineering and Applications 2024, 60 (18): 78-87. DOI: 10.3778/j.issn.1002-8331.2404-0411

Abstract （164）

PDF（pc）（4872KB）（174）

Save

Aiming at the problem of low accuracy of traditional defect detection algorithms in practical applications due to the small size of strip surface defects and blurry collected images in industrial scenarios, an LF-YOLO algorithm for strip surface defect detection in industrial scenarios is proposed. The model upsamples the input pixels by designing a local filling upsampling module to improve the recognition ability of blurred images, and reduce the missed detection rate of small target defects. The FReLU activation function that focuses on visual tasks is introduced to improve the accuracy of model location defects. In addition, a lightweight local attention mechanism is proposed and combined with the feature extraction module C2f to enhance the feature extraction capability of defects of different sizes during the feature extraction process of the model. Experimental results on the Northeastern University open source strip steel dataset NEU-DET and GC10-DET show that the average detection accuracy of the improved model is 7.0 and 15.4 percentage points higher than the accuracy of the original YOLOv8 algorithm, and is better than other classic target detection models. It has advantages in average detection accuracy, and the validity of each module is further verified through ablation experiments.

Reference | Related Articles | Metrics

Select

YOLOv8 Crack Defect Detection Algorithm Based on Multi-Scale Features

ZHAO Baiting, CHENG Ruifeng, JIA Xiaofen

Computer Engineering and Applications 2024, 60 (22): 261-270. DOI: 10.3778/j.issn.1002-8331.2404-0332

Abstract （160）

PDF（pc）（4458KB）（124）

Save

To solve the problems of low detection efficiency and missing detection caused by complex background and large aspect ratio difference of shaft lining cracks, a crack defect detection model EDG-YOLO with multi-scale features is proposed. Firstly, the feature extraction module EIRBlock (efficient inverted residual block) is designed, and C2fEIR is constructed to enhance the ability of backbone network to extract the shallow crack feature information. Secondly, the CSP_EDRAN (CSP efficient dilated reparam aggregation network) is fused in the neck to realize the reuse of the crack feature information, and promote the interaction between the shallow and deep semantic information. Meanwhile, the attention mechanism of DAM (dual attention module) is embedded to enhance the expression ability of shaft lining crack features. Finally, a lightweight detection head GDetect is constructed, and the network is further lightweight with the help of GSConv module. The experimental results on the self-made shaft lining crack dataset show that, compared with YOLOv8, the average detection accuracy of EDG-YOLO is 87.4%, which is increased by 2.3 percentage points, the number of parameters and the amount of calculation of the model are reduced by 33% and 47% respectively. The inference time of a single image is 13.2?ms, which meets the real-time detection requirements of downhole scenes.

Reference | Related Articles | Metrics

Select

Lightweight Face Recognition Algorithm Combining Transformer and CNN

LI Ming, DANG Qingxia

Computer Engineering and Applications 2024, 60 (14): 96-104. DOI: 10.3778/j.issn.1002-8331.2311-0276

Abstract （159）

PDF（pc）（3685KB）（227）

Save

With the development of deep learning, convolutional neural networks have become the mainstream approach for face recognition (FR) by gradually expanding the receptive field through stacking convolutional layers to integrate local features. However, this approach suffers from the drawbacks of neglecting global semantic information of faces and lacking attention to important facial features, resulting in low recognition accuracy. Additionally, the stacking of a large number of parameters and layers poses challenges for deploying the network on resource-constrained devices. Therefore, a highly lightweight face recognition algorithm called gcsamTfaceNet is proposed, which combines Transformer and CNN. Firstly, a depthwise separable convolution is used to construct the backbone network in order to reduce the parameter count of the algorithm. Secondly, a channel-spatial attention mechanism is introduced to optimize the selection of features in both the channel and spatial domains, thereby improving the attention given to important facial regions. Building upon this, the Transformer module is integrated to capture the global semantic information of the feature maps, overcoming the limitations of convolutional neural networks in modeling long-range semantic dependencies and enhancing the algorithm’s ability to perceive global features. The gcsamTfaceNet, with a parameter count of only 6.5×105, is evaluated on nine validation datasets including LFW, CA-LFW, CP-LFW, CFP-FP, CFP-FF, AgeDB-30, VGG2-FP, IJB-B, and IJB-C. It achieves average accuracies of 99.67%, 95.60%, 89.32%, 93.67%, 99.65%, 96.35%, 93.36%, 89.43%, and 91.38% on these datasets, respectively. This demonstrates a good balance between parameter count and performance.

Reference | Related Articles | Metrics

Select

Improved Lightweight Military Aircraft Detection Algorithm of YOLOv8

LIU Li, ZHANG Shuo, BAI Yu’ang, LI Yujian, ZHANG Chuxia

Computer Engineering and Applications 2024, 60 (18): 114-125. DOI: 10.3778/j.issn.1002-8331.2404-0058

Abstract （158）

PDF（pc）（5596KB）（134）

Save

Military aircraft detection with remote sensing images is of great significance in the fields of reconnaissance and early warning, intelligence analysis and so on. In order to make the military aircraft inspection model run efficiently on the equipment with limited computing power, the lightweight improvement of YOLOv8n is carried out from two aspects: network design and model compression. In the aspect of network design, firstly, FAS_C2f is used to replace the C2f module in the original backbone network, which reduces the computational redundancy and speeds up the network feature extraction. Secondly, the network structure is optimized according to the scale characteristics of military aircraft targets to alleviate the problem of small target information loss caused by excessive downsampling. Thirdly, Inner-SIoU is used as a new localization regression loss function to improve the learning ability of small target samples and accelerate the convergence of regression bounding box. In terms of model compression, channel pruning based on LAMP fraction is used to compress the redesigned model to further reduce parameters and model size. With channel-wise knowledge distillation (CWD), the accuracy of the model is restored to the level close to that before pruning. The experimental results show that on the open military aircraft data set MAR20, the mAP of the lightweight model is 97.2%, the volume is only 0.7 MB, which is 88.3% smaller than the original model, and the FPS is increased by 14 frames per second, which meets the real-time requirements of military aircraft target detection.

Reference | Related Articles | Metrics

Most Read articles