Computer Engineering and Applications

Select

Survey on Lane Line Detection Techniques for Classifying Semantic Information Processing Modalities

HONG Shuying, ZHANG Donglin

Computer Engineering and Applications 2025, 61 (5): 1-17. DOI: 10.3778/j.issn.1002-8331.2406-0160

Abstract （158）

PDF（pc）（2981KB）（201）

Save

With the rapid development of autonomous driving technology, lane line detection, as its key component, has attracted widespread attention and shown great potential for application in intelligent transportation systems. However, traditional lane line detection techniques usually struggle to provide satisfactory recognition accuracy when dealing with complex environmental challenges. This paper reviews the development of lane detection technology and systematically sorts out 84 advanced algorithms, and innovatively divides them into four categories based on semantic processing: semantic segmentation assistance, semantic information fusion, semantic information enhancement, and semantic relationship mode-
ling. By deeply analyzing the technical characteristics and advantages of these algorithms, the main limitations of current lane line detection technology are revealed. Finally, the future development direction of lane line detection technology is put forward, especially in the utilization of semantic information, and the potential research direction is pointed out.

Reference | Related Articles | Metrics

Select

Review of Application of BEV Perceptual Learning in Autonomous Driving

HUANG Deqi, HUANG Haifeng, HUANG Deyi, LIU Zhenhang

Computer Engineering and Applications 2025, 61 (6): 1-21. DOI: 10.3778/j.issn.1002-8331.2407-0501

Abstract （157）

PDF（pc）（2080KB）（196）

Save

As the types of sensors used as acquisition inputs in the autonomous driving perception module continue to develop, it becomes more and more difficult to represent the multi-modal data uniformly. BEV perception learning in the automatic driving perception task module can make multi-modal data unified integration into a feature space, which has better development potential compared with other perception learning models. The reasons for the good development potential of BEV perception model are summarized from five aspects: research significance, spatial deployment, preparation work, algorithm development, and evaluation index. The BEV perception model can be summarized into four series from a framework perspective: Lift-Splat-Lss series, IPM reverse perspective conversion, MLP view conversion and Transformer view conversion. The input data can be summarized into two categories: the first type of pure image feature input includes monocular camera input and multi-camera input; the second type of fusion data input is not only the simple data fusion of point cloud data and image features, but also the knowledge distillation fusion guided or supervised by point cloud data and the fusion of height segmentation by guided slice. It provides an overview of the application of four kinds of automatic driving tasks in BEV perception model, such as multi-target tracking, map segmentation, lane detection and 3D target detection, and summarizes the shortcomings of the four series of current BEV perception learning frameworks.

Reference | Related Articles | Metrics

Select

Overview of Multi-View 3D Reconstruction Techniques in Deep Learning

WANG Wenju, TANG Bang, GU Zehua, WANG Sen

Computer Engineering and Applications 2025, 61 (6): 22-35. DOI: 10.3778/j.issn.1002-8331.2405-0328

Abstract （101）

PDF（pc）（3077KB）（148）

Save

In order to solve the problems that classic multi-view 3D reconstruction methods are difficult to reconstruct complex objects and have poor reconstruction results, and to extend to high resolution, deep learning methods are introduced to reconstruct 3D models with higher accuracy. Thus multi-view 3D reconstruction algorithm using deep learning methods are systematically summarized, analyzed and compared, and the multi-view 3D reconstruction algorithms in recent years are classified and sorted out according to explicit geometry and implicit geometry representations. Neural implicit 3D reconstruction algorithms that combines implicit functions and volume rendering are mainly introduced, which currently have a high accuracy in reconstruction results, and the quantitative and qualitative analyses are conducted on some of these algorithms. In addition, commonly used datasets and evaluation indicators are listed, and the future research trends and development directions are discussed.

Reference | Related Articles | Metrics

Select

Escalator Passenger Safety Detection YOLO_BFROI Algorithm Based on Region of Interest

HOU Ying, HU Xin, ZHAO Ruirui, ZHANG Nan, XU Yanhong, MA Li

Computer Engineering and Applications 2025, 61 (6): 84-95. DOI: 10.3778/j.issn.1002-8331.2404-0405

Abstract （88）

PDF（pc）（8541KB）（146）

Save

Intelligent monitoring of escalators is an important means of preventing passenger accidents. However, the operating environment of escalators is complex with small target passenger detection, which can easily lead to missed and false detection. Therefore, a region of interest-based escalator passenger fall detection algorithm using the improved YOLOv8 is proposed in this paper. Firstly, the BiFormer_ROI attention mechanism module based on regions of interest is designed, and a small object detection module group of SPD-Conv and BiFormer_ROI is constructed to improve the YOLOv8 backbone network, so as to shield the complex environmental interference of non-escalator background areas and effectively improve the small targets detection rate. Secondly, considering the practical industrial applications, GhostSlimPAFPN lightweight structure is adopted to optimize the Neck network, which effectively reduces the number of model parameters while maintaining detection accuracy. Finally, the PIoU v2 loss function with target size adaptive penalty factor is adopted to improve the Head network, thereby achieving faster convergence and higher detection precision. On the self-built escalator passenger fall dataset, the experimental results show that the improved algorithm achieves 94.2% average detection precision and 87.7?FPS detection speed. It can effectively reduce false and missed detection, which can better ensure the safety of passengers on the elevator.

Reference | Related Articles | Metrics

Select

Applications of Deep Learning in Knowledge Graph Construction and Reasoning

SUN Yu, LIU Chuan, ZHOU Yang

Computer Engineering and Applications 2025, 61 (6): 36-52. DOI: 10.3778/j.issn.1002-8331.2408-0280

Abstract （100）

PDF（pc）（892KB）（131）

Save

Knowledge graphs, as a structured form of knowledge representation in the field of natural language processing, can describe concepts and their relationships in the real world, and is often used in information retrieval, data management, and other fields. Deep learning has gradually become an emerging research hotspot due to its ability to automatically learn the underlying patterns and hierarchical representations from diverse data, which can be used for precise construction and effective reasoning of large-scale, high-quality knowledge graphs. To further promote the technological integration of deep learning and knowledge graphs, this paper focuses on the construction and reasoning processes of knowledge graphs, providing a comprehensive introduction to the relevant theories and latest research achievements in the fields of knowledge representation, knowledge extraction, knowledge fusion, and knowledge reasoning using deep learning. At the same time, according to the research trend in recent years, the paper highlights and summarizes the latest research results on the integration of graph deep learning and knowledge reasoning applicable to graph data feature inference. Finally, an overview and technical outlook are made on the integration and development of deep learning and knowledge graphs, providing reference and ideas for future research directions.

Reference | Related Articles | Metrics

Select

Improving Lightweight Underwater Biological Detection Model of YOLOv8

MIN Feng, ZHANG Yuwei, LIU Yuhui, LIU Biao

Computer Engineering and Applications 2025, 61 (6): 96-105. DOI: 10.3778/j.issn.1002-8331.2408-0411

Abstract （96）

PDF（pc）（3471KB）（117）

Save

Efficient detection of underwater biological resources in complex natural environments is of great significance to China’s fisheries. In order to solve the problems of weak detection ability and insufficient model generalization of YOLO series for complex underwater environments, a method for underwater biological target detection based on improved YOLOv8n, SGDC-YOLOv8, is proposed. Firstly, the idea of deep supervision is integrated into the detection head, using shared receptive field attention convolution to improve detection accuracy while optimizing the receptive field. An additional supervised loss function is introduced to achieve efficient parameter sharing in the detection head. Secondly, in order to reduce computational costs and parameter count, a lightweight gated regularization unit convolution module is designed to reduce the burden on the model. Aiming at the problem of easily blurred or lost features of underwater biological targets, shallow mixed pool downsampling module and deep maximum pool downsampling module are proposed to optimize multi-scale feature fusion and ensure the accuracy and completeness of key data. Finally, a convolutional and attention fusion CAFM module is added to the network to enhance global and local feature modeling. The experimental results on the publicly available dataset DUO show that compared to the baseline model YOLOv8n, SGDC-YOLOv8 increases by 2.5?percentage points at mAP@50, and 1.8 percentage points in mAP@50-95. It results in a decrease of 14.62% in parameter count and 15.85% in computational complexity. FPS increases to 146.2, which is also the best performance compared to other mainstream object detection models.

Reference | Related Articles | Metrics

Select

Implementation of Meteorological Database Question-Answering Based on Large-Scale Model Retrieval-Augmentation Generation

JIANG Shuangwu, ZHANG Jiawei, HUA Liansheng, YANG Jinglin

Computer Engineering and Applications 2025, 61 (5): 113-121. DOI: 10.3778/j.issn.1002-8331.2406-0230

Abstract （82）

PDF（pc）（1198KB）（100）

Save

With the increasing demand for information retrieval and knowledge acquisition, question-answering systems are widely applied across various domains. However, there is a notable lack of specialized question-answering system research in the meteorological field, which severely limits the efficient utilization of meteorological information and the service efficiency of meteorological systems. To address this gap, it proposes a retrieval-augmented generation based question-answering implementation scheme for meteorological databases. This scheme designs a multi-channel query routing （McRR） method based on relational databases （SQL） and document-oriented data （NoSQL）. Additionally, to adapt large model queries to databases and enhance the model’s understanding of query tables, the paper proposes an instruction query conversion method and a database table summarization method （termed as DNSUM） to improve the model’s semantic understanding of databases. Furthermore, by integrating key modules such as question understanding, re-rankers, and response generation, it constructs an end-to-end intelligent question-answering engine capable of retrieving relevant knowledge and generating answers from multiple data sources. Experimental results on the constructed meteorological question-answering dataset demonstrate that this engine effectively understands user queries and generates accurate answers, exhibiting strong retrieval and response capabilities. This research not only provides a question-answering solution for the meteorological field but also offers new directions for the application of question-answering technology in vertical domains.

Reference | Related Articles | Metrics

Select

X-Ray Image Contraband Detection Based on Improved YOLOv8s

YAN Zhiming, LI Xinwei, YANG Yi

Computer Engineering and Applications 2025, 61 (6): 141-149. DOI: 10.3778/j.issn.1002-8331.2403-0139

Abstract （69）

PDF（pc）（1469KB）（86）

Save

The variable size of contraband in X-ray images and mutual occlusion are the main factors for the low detection accuracy of small model target detection methods, in order to improve the accuracy of contraband detection under the restricted model parameters, an improved small YOLOv8SP contraband detection network is proposed. Aiming at the problem of different sizes of contraband and the difficulty of identifying small targets, a multi-size spatial pyramid pooling module is designed to realize multi-scale feature extraction by using a dense connection method. For the leakage detection problem caused by mutual occlusion of contraband, a parallel attention module is designed to improve the feature extraction ability of occluded objects. A large number of experiments prove that YOLOv8SP achieves 94.27% detection accuracy on the SIXray dataset at a very small scale, which is 2.13?percentage points higher than the original network, and the detection speed is 115 frames per second. It also has obvious advantages in terms of accuracy and computation speed compared with similar networks, which proves the effectiveness of the designed algorithm.

Reference | Related Articles | Metrics

Select

Target Tracking Algorithm with Feature Fusion and Transformer Based Model Predictor

GONG Xiaomei, ZHANG Yi, HU Shu

Computer Engineering and Applications 2025, 61 (6): 254-262. DOI: 10.3778/j.issn.1002-8331.2311-0077

Abstract （60）

PDF（pc）（1698KB）（86）

Save

Discriminative correlation filters (DCF) have achieved much success in visual tracking. However, most of them simply rely on the features extracted by the last layer of the backbone, while ignoring the low-level rich structural information. In view of this, a target tracking algorithm based on the feature fusion module and the Transformer structure model predictor is proposed. Firstly, a feature fusion module is introduced that integrates the low-level feature and high-level feature via a pyramidal structure. Then, a modified Transformer with asymmetric positional encoding scheme is applied to predict the weights of the model, which can effectively release the expressive ability of the model. Finally, a feature refinement module is employed to optimize the search features. Compared with the existing works, the tracker achieves better feature expression and more precise target localization. Extensive experiments on 3 mainstream datasets, TrackingNet, LaSOT and UAV123, demonstrate that the tracker gains prominent tracking results.

Reference | Related Articles | Metrics

Select

Human Pose Estimation Based on Dual-Stream Fusion of CNN and Transformer

LI Xin, ZHANG Dan, GUO Xin, WANG Song, CHEN Enqing

Computer Engineering and Applications 2025, 61 (5): 187-199. DOI: 10.3778/j.issn.1002-8331.2406-0076

Abstract （74）

PDF（pc）（1509KB）（86）

Save

Convolutional neural network (CNN) and Transformer models are widely used in human pose estimation. However, Transformer focuses more on capturing the global features of images, and it overlooks the importance of local features for detailed human pose estimation. Conversely, CNN lacks the global modeling capabilities of Transformer. To fully leverage the strengths of CNN in processing local information and Transformer in capturing global information, this paper proposes a CNN-Transformer dual-flow parallel network architecture to aggregate rich feature information. Traditional Transformer requires flattening images into multiple patches, which is detrimental to extracting position-sensitive human structural information. Therefore, the multi-head attention structure is improved in this paper, so that the model input can maintain the structure of the original 2D feature map. Additionally, a feature coupling module is introduced to fuse features from different resolutions of the two branches, maximizing the retention of both local features and global features.Finally, an improved coordinate attention module is incorporated to further enhance the network’s feature extraction capability. Experimental results on COCO and MPII datasets demonstrate that the proposed model achieves higher detection accuracy compared to current mainstream models, which indicates that the proposed model can effectively capture and integrate both local and global features in the human pose.

Reference | Related Articles | Metrics

Select

Review of Application of Spatiotemporal Graph Neural Networks in Internet of Things

ZHANG Jianwei, CHEN Xu, WANG Shuyang, JING Yongjun, SONG Jifei

Computer Engineering and Applications 2025, 61 (5): 43-54. DOI: 10.3778/j.issn.1002-8331.2404-0043

Abstract （52）

PDF（pc）（1073KB）（83）

Save

With the development of physical devices in various fields of the Internet of things(IoT), the large amount of data generated has brought challenges to current data processing methods. Deep learning models have the ability to process large-scale and high-dimensional data, and have gradually been applied to different fields of the Internet of things. As a deep learning model for processing graph structured data, spatiotemporal graph neural network can model the topological structure and temporal information in the Internet of things and show excellent performance in the prediction tasks of the Internet of things. Firstly, the temporal correlation and spatial correlation in the Internet of things, as well as the construction methods of different spatiotemporal network architectures are introduced. Based on the difference in spatial correlation, the spatiotemporal graph neural network is divided into spatiotemporal graph convolutional network and spatiotemporal graph attention network. Then, the application of spatiotemporal graph convolutional network and spatiotemporal graph attention network in the Internet of things is further analyzed, mainly including the fields of transportation, environment and energy. Finally, the challenges faced by spatiotemporal graph neural network in the application of the Internet of things and the future research directions are discussed.

Reference | Related Articles | Metrics

Select

Review of SM9 Identity Authentication Schemes and Their Applications

CHEN Zeyu, LIU Lihua, WANG Shangping

Computer Engineering and Applications 2025, 61 (5): 18-31. DOI: 10.3778/j.issn.1002-8331.2408-0245

Abstract （65）

PDF（pc）（5825KB）（80）

Save

Commercial cryptography is essential to China’s cryptographic framework, serving as the foundation for national security. The SM9 algorithm is widely used in the field of identity authentication, due to its certificate free nature, ease of management, and low overall cost. The overall framework and key technologies of the SM9 algorithm are summarized and compared with similar algorithms. The progress of its identity authentication scheme, particularly in blind signatures, denial authentication signatures, ring signatures, and attribute signatures, is discussed. The application of the SM9 algorithm in the field of blockchain security is highlighted, including scenarios such as privacy enhancement, facilitation of smart contracts, and cross-domain authentication, as well as its application in the field of Internet of things (IoT) security, that is, the scheme characteristics of industrial IoT, power IoT, and Internet of vehicles (IoV) security. Finally, the multidimensional analysis of security of SM9 algorithm in identity authentication is analyzed, which provides a new idea for algorithm evaluation and optimization.

Reference | Related Articles | Metrics

Select

Improvement of Early Warning Algorithm of YOLOv10 for Actors in Rooftop Photovoltaic Power-Related Area

LI Jun, FANG Zhiyuan, ZHOU Haoxing

Computer Engineering and Applications 2025, 61 (5): 211-221. DOI: 10.3778/j.issn.1002-8331.2407-0519

Abstract （86）

PDF（pc）（1750KB）（83）

Save

With the increasing scale of distributed photovoltaic (PV) installations, PV panels and their associated power generation systems installed on private rooftops may pose an electric shock hazard to users. To address the issues of boundary crossing warnings and the limited computational capacity of regional monitoring equipment, this paper proposes an improved tracking and warning algorithm based on YOLOv10, DeepSort, and PNPOLY. To reduce the high computational complexity in feature fusion, the feature fusion module in YOLOv10 is redesigned using the concept of partial convolution. Furthermore, to enhance the detection of individuals occluded by PV panels, the Repulsion-IoU loss function is introduced, and a three-layer feature extraction structure (AFPN-3) is designed based on the asymptotic feature pyramid network for object detection (AFPN) to better integrate multi-level targets. On this basis, continuous individual tracking is performed using DeepSort, and the lightweight Fasternet is used to replace the original feature extraction network, reducing the model size and improving tracking quality. To accurately determine boundary crossing, the PNPOLY algorithm is used to detect the coordinates of individuals’feet. Experimental results show that, compared to YOLOv10n, the improved model reduces computational complexity by 18%, reduces the number of parameters by 19%, and only loses 0.3 percentage points in detection accuracy. The improved DeepSort model increases the average tracking accuracy by 1.4 percentage points and reduces the model size to 8.7 MB. The proposed algorithm achieves a 94% accuracy rate in warning individuals in electrified rooftop PV areas, demonstrating its lightweight and high-precision characteristics, meeting the practical needs for tracking and warning individuals near rooftop PV panels.

Reference | Related Articles | Metrics

Select

Beluga Whale Optimization with Improved Multi-Strategy Integration Problem

CHAI Yan, CHANG Xiaomeng, REN Sheng

Computer Engineering and Applications 2025, 61 (5): 76-93. DOI: 10.3778/j.issn.1002-8331.2403-0391

Abstract （60）

PDF（pc）（2125KB）（80）

Save

In order to further improve the optimization ability and convergence speed of the beluga whale optimization (BWO), a multi-strategy improved beluga whale optimization (MIBWO) algorithm based on multi-strategy improvement is proposed. In order to avoid the rapid convergence of the convergence factor in the late iteration, the dynamic equilibrium search strategy is used to increase the population diversity and the reverse solution generated by quasi-reverse learning to enhance the quality of the initial solution, which lays the foundation for the optimization performance of the algorithm.Theoretical analysis and numerical experiments show that the MIBWO algorithm has strong optimization performance. The MIBWO algorithm has good optimization performance, convergence speed and robustness in PV identification, and has certain practical engineering application prospects.

Reference | Related Articles | Metrics

Select

Review of Research on Fusion Technology of Speech Recognition and Large Language Models

WANG Jingkai, QIN Donghong, BAI Fengbo, LI Lulu, KONG Lingru, XU Chen

Computer Engineering and Applications 2025, 61 (6): 53-63. DOI: 10.3778/j.issn.1002-8331.2405-0145

Abstract （60）

PDF（pc）（1362KB）（74）

Save

In the current era, various large language models （LLMs） have emerged, driving the development and innovation in many fields of artificial intelligence. Summarizing the positive effects of LLMs in speech recognition technology and exploring its development prospects can provide innovative ideas for the advancement of speech recognition technology. In current mainstream end-to-end speech recognition models, additional language models are often used to rescore the speech recognition results or combined with WFST algorithm to assist in decoding, thereby improving the accuracy of the speech recognition results. Recent studies have found that integrating LLMs into the end-to-end training of speech recognition models can further enhance the accuracy of the recognition results. Taking the three types of speech recognition and language model fusion methods, shallow fusion, deep fusion, and cold fusion, as the main line, and their principles and advantages and disadvantages are analyzed. Recent experiments by researchers have confirmed that combining LLMs with acoustic models can effectively improve recognition accuracy. After systematically reviewing the research progress of LLMs in ASR technology, it is also revealed that the models play an important role in the speech recognition area. The related technology integration of speech recognition and LLMs has gradually matured, presenting that it is valuable to commit further exploration and in-depth research.

Reference | Related Articles | Metrics

Select

Improved Detection Method for Human Abnormal Behavior in Generative Adversarial Networks

ZHANG Hongmin, ZHENG Jingtian, YAN Dingding, TIAN Qianqian

Computer Engineering and Applications 2025, 61 (5): 147-154. DOI: 10.3778/j.issn.1002-8331.2310-0021

Abstract （53）

PDF（pc）（1424KB）（78）

Save

The reconstruction model based on generative adversarial networks may correspond to small reconstruction errors when reconstructing video frames. In addition, adversarial training in prediction and reconstruction models is often unstable, which affects the detection performance of the model. To address the above issues, a bidirectional prediction network using both predictor and discriminator is proposed based on generative adversarial networks to detect abnormal human behavior in videos. The training process of this network is divided into two stages. The first stage extracts the temporal and spatial information of the input video frames through a predictor, and introduces an attention mechanism to focus on the actual motion area. It predicts the intermediate frames of the normal video frame sequence while preserving the state of the predictor during the training process. In the second stage, the role of the discriminator is changed from distinguishing between generated data and real data to distinguishing the quality of predicted frames. The discriminator learns to detect subtle distortions that often occur when generating abnormal input predicted frames, improving the stability of the training process and the accuracy of the detection results. The model achieves frame level AUC of 98.7%, 91.8%, and 84.6% on the UCSD Ped2, Avenue, and ShanghaiTech datasets used for video human abnormal behavior detection.

Reference | Related Articles | Metrics

Select

Survey of Emotion Generation for Emotional Dialogue

LIU Jia, MA Zhiqiang, LYU Kai, GUO Siyuan, ZHOU Yutong, XU Biqi

Computer Engineering and Applications 2025, 61 (5): 55-75. DOI: 10.3778/j.issn.1002-8331.2404-0011

Abstract （59）

PDF（pc）（1647KB）（75）

Save

Emotional dialogue is crucial in endowing conversational agents with the ability to handle emotions, aiming to equip them with capabilities in emotion recognition, understanding, and generation. To address the deficiency in the generation of emotions within emotional dialogue technology, the task of emotion generation has been proposed and developed as a core task in artificial intelligence for emotional expression. Its objective is to generate contextually appropriate artificial emotion categories and provide emotional guidance for relevant downstream tasks. Since 2018, researchers in the field of emotional dialogue have devoted themselves to enabling conversational agents to generate controllable emotions, exhibit strong empathy, and assist users in alleviating emotional distress. All these efforts highlight a strong demand for the task of emotion generation. Firstly, the definition and basic framework of the emotion generation task are provided, along with a brief introduction to the definitions of downstream tasks within the field of emotional dialogue. Secondly, the research status of the emotion generation task and its downstream tasks are summarized, along with commonly used datasets and evaluation metrics for this task. Finally, the challenges and future directions of the emotion generation task are outlined.

Reference | Related Articles | Metrics

Select

Polarizer Surface Defect Detection Algorithm Based on YOLOv8-S

SHENG Wei, ZHOU Yongxia, CHEN Junjie, ZHAO Ping

Computer Engineering and Applications 2025, 61 (6): 128-140. DOI: 10.3778/j.issn.1002-8331.2401-0382

Abstract （50）

PDF（pc）（1669KB）（72）

Save

As the polarizer market continues to expand, the application is more and more extensive, the production requirements for polarizer are also more and more stringent. Aiming at the problems of complex defect morphology, small-size defects detection false and missed in polarizer surface defect detection, this paper proposes an improved algorithm based on YOLOv8-S polarizer surface defect detection. DCNv3 is used to replace the ordinary convolution in the C2f module of the backbone network, and at the same time, combining with the EMA , the DEC2f feature extraction module is constructed, which improves the feature extraction capability of the backbone network for complex defects. Lightweight cross-scale feature refinement fusion module (LCFRFM) is constructed based on the feature refinement module to improve the channel purification capability and reduce the number of parameters, and effectively cross-scale fusion of shallow features in the backbone network. The ConvMixer Layer is introduced to construct the CMC2f prediction head, and the larger prediction field of view brings stronger small-size defect detection capability. SIoU is used to replace CIoU as the bounding box regression loss function, and AdamW is used to replace SGD as the optimizer during network training to improve the detection accuracy and training convergence speed. The experimental results show that the proposed algorithm improves 2.4 and 2.9 percentage points on mAP50 and mAP50:95, respectively, compared to YOLOv8-S, which proves the effectiveness of the proposed algorithm.

Reference | Related Articles | Metrics

Select

Research on Multimodal Hierarchical Feature Mapping and Fusion Representation Method

GUO Xiaoyu, MA Jing, CHEN Jie

Computer Engineering and Applications 2025, 61 (6): 171-182. DOI: 10.3778/j.issn.1002-8331.2310-0392

Abstract （61）

PDF（pc）（1158KB）（71）

Save

Multimodal feature representation serves as the foundation for multimodal tasks. To address the issue of a single-level fusion in existing multimodal feature representation methods, which fails to adequately capture the inter-modal relationships, a novel approach for multimodal hierarchical feature mapping and fusion representation is proposed. This method, built upon the text model RoBERTa and the image model DenseNet, extracts features from intermediate layers of both models spanning from low to high levels. Leveraging the concept of feature reuse, it maps and fuses features at different levels of the text and image modalities, capturing the internal relationships between text and image modalities and effectively integrating features between the two modalities. The hierarchical feature mapping and fusion representation is then fed into a classifier for sentiment classification in the context of multimodal sentiment analysis. A comparative analysis is also conducted between the constructed representation method and baseline representation methods. The experimental results indicate that the proposed representation method surpasses all baseline models in terms of sentiment classification performance on both the Weibo sentiment and MVSA-Multiple datasets. Specifically, it achieves a 0.013?7 increase in F1 score on the Weibo dataset and a 0.022?2 increase on the MVSA-Multiple dataset. Image features enhance sentiment classification accuracy under the single modality of text, but the degree of improvement is closely tied to the fusion strategy. The multimodal hierarchical feature mapping and fusion representation method effectively maps the relationship between text and image features, ultimately improving the effectiveness of sentiment classification in multimodal sentiment analysis.

Reference | Related Articles | Metrics

Select

Lightweight Low-Light Object Detection Algorithm Based on CDD-YOLO

SHI Lichen, YANG Chao, LIU Xuechao, ZHOU Xingyu

Computer Engineering and Applications 2025, 61 (6): 106-117. DOI: 10.3778/j.issn.1002-8331.2410-0127

Abstract （63）

PDF（pc）（2436KB）（71）

Save

To address the challenges of low detection accuracy, high computational costs, and excessive memory consumption encountered by target detection algorithms in low-light conditions, this paper proposes a lightweight low-light target detection network model, CDD-YOLO, to enhance the performance of YOLOv8. Firstly, a multi-scale convolutional module based on a coordinate attention mechanism is proposed to extract texture features from different sensory fields and to capture long-range dependencies between spatial locations. Secondly, a dynamic head frame is integrated into the detection head to minimize the interference caused by complex backgrounds and scale variations. The bounding box regression loss function is designed using a dynamic non-monotonic focusing mechanism to enhance the regression path and quality of the anchor boxes, thereby improving the adaptability of model to variations in lighting and noise. Finally, redundant parameters in the model are pruned using a pruning algorithm to achieve model lightweighting. The self-constructed dataset, ExDark, and the VOC dataset are used for experimental validation. The experimental results show that the proposed method has better detection effect compared with the mainstream algorithms, and achieves a better balance between computational complexity and detection accuracy.

Reference | Related Articles | Metrics

Select

Computer Engineering and Applications 2025, 61 (6): 0-0.

Abstract （74）

PDF（pc）（693KB）（70）

Save

Related Articles | Metrics

Select

Leakage-YOLO: Real-Time Object Detection Algorithm for Crack and Leakage in Tunnel Scenarios

CHEN Cansen, LIU Wei

Computer Engineering and Applications 2025, 61 (6): 118-127. DOI: 10.3778/j.issn.1002-8331.2408-0319

Abstract （58）

PDF（pc）（5927KB）（68）

Save

The detection of cracks and water leakage in tunnel shield linings is essential for ensuring the structural safety and extending the service life of tunnels. With the advancement of object detection technologies, advanced techniques have been increasingly applied to the automatic detection of cracks and leakage areas in tunnel shield linings to improve detection efficiency and precision. Therefore, to further improve the precision of detecting these areas and to achieve real-time detection, the Leakage-YOLO algorithm is proposed, based on YOLOv8. The algorithm introduces a regional spotlight attention (RSA) module into the detection neck, which better integrates global and local feature information, thereby enhancing the ability to extract key regional features. This effectively addresses the challenge of extracting significant features in crack and leakage areas. Additionally, by modifying the detection head, a novel SE-Head structure is proposed, further enhancing the ability to extract detailed edge features, effectively improving the precision of crack and leakage area localization. Experimental results on public datasets in real-world scenarios demonstrate that the improved algorithm outperforms the original algorithm with increases of 4.7, 4.9, and 6.7 percentage points in AP, AP0.5, and AP0.75, respectively. Compared with other mainstream algorithms, the effectiveness and superiority of the Leakage-YOLO are further verified.

Reference | Related Articles | Metrics

Select

Improved YOLOv8s Model for Small Object Detection from Perspective of Drones

PAN Wei, WEI Chao, QIAN Chunyu, YANG Zhe

Computer Engineering and Applications 2024, 60 (9): 142-150. DOI: 10.3778/j.issn.1002-8331.2312-0043

Abstract （731）

PDF（pc）（5858KB）（852）

Save

Facing with the problems of small and densely distributed image targets, uneven class distribution, and model size limitation of hardware conditions, object detection from the perspective of drones has less precise results. A new improved model based on YOLOv8s with multiple attention mechanisms is proposed. To solve the problem of shared attention weight parameters in receptive field features and enhance feature extraction ability, receptive field attention convolution and CBAM (concentration based attention module) attention mechanism are introduced into the backbone, adding attention weight in channel and spatial dimensions. By introducing large separable kernel attention into feature pyramid pooling layers, information fusion between different levels of features is increased. The feature layers with rich semantic information of small targets are added to improve the neck structure. The inner-IoU loss function is used to improve the MPDIoU (minimum point distance based IoU) function and the inner-MPDIoU instead of the original loss function is used to enhance the learning ability for difficult samples. The experimental results show that the improved YOLOv8s model has improved mAP, P, and R by 16.1%, 9.3%, and 14.9% respectively on the VisDrone dataset, surpassing YOLOv8m in performance and can be effectively applied to unmanned aerial vehicle visual detection tasks.

Reference | Related Articles | Metrics

Select

Integrating Multilingual Knowledge for Implicit Aspect-Based Sentiment Analysis of MOOC Reviews

CHEN Huaibo, ZHANG Huibing, SHOU Zhaoyu, PAN Fang

Computer Engineering and Applications 2025, 61 (5): 104-112. DOI: 10.3778/j.issn.1002-8331.2405-0163

Abstract （50）

PDF（pc）（1271KB）（68）

Save

The issue of low completion rates in MOOCs severely restricts their high-quality development. The implicit emotions contained in expressions such as metaphors, objective factual descriptions, sarcasm, and rhetorical questions in MOOC comments more genuinely reflect users’ learning experiences. Analyzing and utilizing this information to uncover student feedback on the courses and make corresponding improvements can help enhance MOOC completion rates. To this end, the paper proposes a MOOC implicit aspect sentiment analysis model integrating multilingual knowledge to obtain more accurate implicit sentiment information. To address the lack of clear emotional tendencies in the first two expressions, a multi-graph neural network is introduced to combine multi-language knowledge such as part of speech, semantics, syntax, and semantic primitives, fully utilizing the associated relationships to uncover implicit emotional information in comments. Meanwhile, to address the issue of emotional words not matching the true sentiment polarity in the last two expression methods, a multi-level attention mechanism is constructed to capture emotional information at both the coarse-grained level of overall semantics and the fine-grained level of aspect words. Testing the model on the MOOC dataset for paper construction achieves accuracy and F1 scores of 90.2% and 93.8%, respectively. Comparative experiments on SMP2019-ECISA dataset reveals an improved accuracy of the proposed model by 1.7 percentage points compared to models like KC-ISA-BERT.

Reference | Related Articles | Metrics

Select

Development and Application of Light Gradient Boosting Machine

WEI Jiamei, YUAN Shujuan, KONG Shanshan, YANG Aimin, ZHAO Chenying

Computer Engineering and Applications 2025, 61 (5): 32-42. DOI: 10.3778/j.issn.1002-8331.2405-0396

Abstract （62）

PDF（pc）（1043KB）（63）

Save

Light gradient boosting machine (LightGBM) is one of the more powerful algorithms in the field of machine learning. LightGBM uses an efficient tree learning algorithm to train models faster. Its unique histogram bucketing method and gradient-based one-sided leaf growing technique reduce memory usage and computational cost. LightGBM is widely used in medical, natural language processing, finance, industrial manufacturing and other fields. However, LightGBM still faces many challenges in high-dimensional data processing, category feature processing, and model interpretability, etc. At present, the methods to solve these problems mainly focus on feature engineering, visualization, model mixing, etc, and have achieved good results. Firstly, the algorithm principles and variants of the decision tree family are introduced. Secondly, the principles, advantages and disadvantages of LightGBM are sorted out, the challenges faced by the algorithm are summarized, and the future research hot spots and difficulties of LightGBM are pointed out. Finally, the development of LightGBM is summarized and prospected.

Reference | Related Articles | Metrics

Select

Adaptive Separation of Knowledge Distillation for Remote Sensing Object Detection

YANG Xiaoyu, GU Jinguang

Computer Engineering and Applications 2025, 61 (6): 295-303. DOI: 10.3778/j.issn.1002-8331.2311-0170

Abstract （36）

PDF（pc）（14318KB）（56）

Save

In recent years, deep models have achieved great success in large-scale applications, but issues such as computational complexity and storage requirements make them difficult to deploy on resource-limited devices. Knowledge distillation (KD) is a method for compressing model, however, existing methods do not consider the characteristics of remote sensing datasets. Specifically, in remote sensing datasets due to the complex background and small target objects in the images, a large amount of noise occurs when applying the existing knowledge distillation methods directly, which affects the training performance. Therefore, the adaptive separation of knowledge distillation (ASKD) method is proposed. ASKD allows the student model to automatically select multi-scale core features to reduce noise, and at the same time effectively suppresses background interference by separating global and local features. ASKD achieves excellent performance on both single-stage and two-stage detectors on both LEVIR and SSDD datasets. For example, based on Faster RCNN of ResNet-18, ASKD achieves 59.2% mAP on SSDD, which is 2.0 percentage points higher than the baseline model and even better than the teacher model.

Reference | Related Articles | Metrics

Select

Low-Light Image Enhancement Using Brightness and Signal-to-Noise Ratio Guided Transformer

DU Xiaogang, LU Wenjie, LEI Tao, WANG Yingbo

Computer Engineering and Applications 2025, 61 (6): 263-272. DOI: 10.3778/j.issn.1002-8331.2312-0361

Abstract （43）

PDF（pc）（1970KB）（55）

Save

The enhanced images generated by some existing low-light image enhancement methods have problems such as uneven brightness, poor denoising effect, and lack of detailed information. To solve these issues, this paper proposes a low-light image enhancement network based on brightness and signal-to-noise ratio guided Transformer. This network has the following advantages: a brightness and signal-to-noise ratio generation sub-network is designed to extract global illumination information and locate dark areas with missing information. The Transformer is guided by brightness and signal-to-noise ratio feature maps to extract long-distance features only from dark areas with missing information to reduce the calculation complexity. Meanwhile, the subsequent feature fusion module is guided to enrich the details of dark areas with the help of bright area information and achieve information sharing. A cross-fusion attention module is designed and introduced between the encoder and decoder, thereby the ability of network is improved to retain image details. Experimental results on four public datasets show that BSGFormer can achieve better enhancement effects than the popular methods in both subjective vision and objective evaluation.

Reference | Related Articles | Metrics

Select

Review on Enhancing Reasoning Abilities of Large Language Model Through Structured Thinking Prompts

TAO Jiangyao, XI Xuefeng, SHENG Shengli, CUI Zhiming, ZUO Yan

Computer Engineering and Applications 2025, 61 (6): 64-83. DOI: 10.3778/j.issn.1002-8331.2405-0069

Abstract （36）

PDF（pc）（1132KB）（54）

Save

In recent years, the field of natural language processing has witnessed the rapid rise of prompt learning, particularly with the outstanding performance demonstrated in large language models such as GPT and Claude. It has sparked widespread academic interest and extensive research. In view of its immense potential, how to understand the underlying mechanisms of prompt learning and develope more efficient prompt design strategies have become pressing issues in the field. This paper introduces the innovative concept of structured thinking prompts aiming to systematically analyze and reconstruct existing prompt learning paradigms from the perspective of human cognitive logic. The paper explains the basic principles of prompt learning and delves into how cognitive science theories provide inspiration and guidance for prompt design. It then constructs a comprehensive structured thinking prompt framework, detailing four core methods: chain-of-thought prompts, decomposition-based prompts, framework-based prompts, and team collaboration-based prompts. These methods highlight the unique value of structured thinking prompts in enhancing model performance and generalization capabilities. Furthermore, the paper proposes an evaluation system for structured thinking prompts, aiming at scientifically and objectively assessing their effectiveness. It also explores various optimization strategies to further improve the efficiency and effectiveness of prompt design. Additionally, the challenges currently faced by structured thinking prompts, particularly the issue of rising computational costs, are discussed, providing direction for future research. The paper envisions the future development trends of structured thinking prompts, emphasizing their pivotal role and potential opportunities in advancing not only natural language processing but also the broader field of artificial intelligence.

Reference | Related Articles | Metrics

Select

Knowledge Graph Embedding Model Incorporating Multi-Level Convolutional Neural Networks

LI Min, LI Xuejun, LIAO Jing

Computer Engineering and Applications 2025, 61 (6): 192-198. DOI: 10.3778/j.issn.1002-8331.2310-0360

Abstract （122）

PDF（pc）（748KB）（52）

Save

Knowledge graph embedding projects entities and relations into a continuous low-dimensional embedding space to learn the triple features. The model based on translation cannot extract deep knowledge and has limited feature expression ability. Although the model based on neural network can extract deep knowledge, it is easy to lose shallow knowledge, and has weak feature interaction ability between entities and relations. In order to fully extract the shallow and deep features of triple in the model based on neural network, this paper introduces a knowledge graph embedding model incorporating multi-level convolutional neural networks called ConvM. ConvM model uses the recombination embedding method of cross-arrangement of head entities and relations to enhance the feature interaction between them. It also adopts the feature extraction module that combines dilated convolution with one-dimensional and three-dimensional convolution kernels to capture multiscale interaction features between entities and relations. In addition, ConvM model introduces a residual connection to improve the forgetting problem of original information. Five public datasets serve as the basis for conducting link prediction experiments. Experimental results demonstrate that ConvM model outperforms ConvE model, with MRR metric improved by 23.3%, 10.8%, and 12.2% on FB15k, FB15k-237, and Kinship datasets, respectively. These findings highlight the outstanding feature expression capability of ConvM model and its effective enhancement of link prediction performance.

Reference | Related Articles | Metrics

Select

Research on Detection and Classification Model of Illegal and Criminal Android Malware Integrating CBAM

LIU Hongyu, GAO Jian

Computer Engineering and Applications 2025, 61 (6): 317-327. DOI: 10.3778/j.issn.1002-8331.2311-0219

Abstract （34）

PDF（pc）（2665KB）（52）

Save

In response to the increasing frequency of illegal and criminal activities in mobile terminal APP in the field of public security work, a deep learning model based on the Android illegal and criminal APP dataset and integrating CBAM attention mechanism is proposed to address the issues of limited quantity and unclear classification of relevant datasets in the detection field of Android malicious illegal and criminal software, as well as the lack of feasible methods for identifying Android malicious and criminal software. Firstly, 6?181 illegal and criminal APPs are collected and organized into 4 families. Grayscale, RGB, and RGBA images are performed in visualization processing on illegal APP software. A deep model fused with CBAM attention mechanism is used for family detection and classification. Experiments on the illegal and criminal APP dataset show that the Resnet18 model fused with CBAM mechanism improves its accuracy by 4.04% on RGBA images compared with grayscale images without the mechanism, reaching 93.52%. The fused CBAM mechanism model is validated on the public Drebin dataset, and the introduction of the CBAM deep learning model VGG16 achieves an accuracy of 96.35% on RGBA images.

Reference | Related Articles | Metrics

Select

Application of Genetic-Inspired Mapping Strategy in Quantum Circuit Optimization

HAN Zi’ao , LI Hui, LU Kai, LIU Shujuan, JU Mingmei

Computer Engineering and Applications 2025, 61 (5): 94-103. DOI: 10.3778/j.issn.1002-8331.2408-0285

Abstract （36）

PDF（pc）（5428KB）（54）

Save

The current qubit mapping strategies are predominantly deterministic, leading to quantum circuit mappings that lack diversity, making it challenging to balance quality and diversity while being inflexible to various quantum computing tasks. To address this issue, a genetic-inspired quantum circuit mapping strategy (GQCMS) has been developed. This strategy introduces diversified crossover and mutation operations, enabling the algorithm to continually generate diverse candidate solutions through broader search spaces, thereby avoiding local optima and increasing the likelihood of identifying global optima, which ultimately enhances the overall quality of circuit mapping. Additionally, to overcome the frequent SWAP operations caused by distant qubit placements in traditional mapping strategies, a proximity-gate-based initialization method is proposed. By prioritizing nearest-neighbor qubits, this method reduces the need for SWAP operations, decreases circuit complexity, and shortens computation time during the mapping process. Experimental results indicate that GQCMS significantly outperforms 2QAN in the t|ket> and Qiskit compilers, with an average reduction of 44.8% and 62.5% in SWAP gate count, respectively, and a 42.9% average reduction in mapping runtime in t|ket>.

Reference | Related Articles | Metrics

Select

Single Image Deraining Using Rainy Streak Degradation Prediction and Pre-Trained Diffusion Prior

XIE Ruilin, WU Hao, YUAN Guowu

Computer Engineering and Applications 2025, 61 (6): 304-316. DOI: 10.3778/j.issn.1002-8331.2311-0264

Abstract （22）

PDF（pc）（15753KB）（50）

Save

Images captured in rainy weather degrade visual quality and subsequent task accuracy due to interference from rain streaks. To effectively apply the generative prior in the diffusion model as well as to avoid the computational burden of re-training the conditional diffusion model, a single image deraining method combining rain streak degradation prediction and unconditional pre-trained diffusion model is proposed, which is achieved by using the convolutional dictionary learning mechanism to obtain rain streak maps with raining images, the rain streak map will be used to bootstrap the null-space diffusion model. Single image deraining using an existing pre-trained unconditional diffusion model, thus effectively improving the quality of image deraining. Compared with other single image deraining methods, the method achieves the highest PSNR improvement of 0.44?dB (+1.1%), SSIM improvement of 0.006 (+0.7%), and LPIPS improvement of 0.008 (+42.1%) on the Rain100H and Rain100L datasets, achieving the state-of-the-art results so far on both datasets.

Reference | Related Articles | Metrics

Select

Study of Real-Time Semantic Segmentation Algorithms with Improved Parallel Two-Branch Structure

MIAO Siqi, DU Yu, YAN Chao, XU Cheng, SUN Huihui

Computer Engineering and Applications 2025, 61 (5): 233-240. DOI: 10.3778/j.issn.1002-8331.2310-0034

Abstract （39）

PDF（pc）（1108KB）（52）

Save

In order to solve the problem of losing small target information and details being flooded by context in road scenes, a DDRPNet model with parallel two-branch structure is proposed. The proposed DDRPNet has two noticeable features. Firstly, the PAPPM module is introduced to fuse the semantic edge features at different scales. Secondly, a coordinate attention mechanism is added after the 1/16, 1/32 and 1/64 resolution feature maps of the low-resolution branch to capture the position and channel information at different scales and fill the small target information loss problem. This paper verifies the efficacy of the proposed DDRPNet on the Cityscapes dataset, and the proposed model reaches 76.28% average intersection and merger ratio with 46.3 FPS speed. On the CamVid dataset, the proposed model reaches 73.2% average intersection and merger ratio with 95.2 FPS speed. The model achieves a good balance between accuracy and speed, and the semantic segmentation performance is significantly improved, which has potential applications in the field of intelligent driving.

Reference | Related Articles | Metrics

Select

Improved HCA* Path Planning for UAS Traffic Management

CHEN Ming, HE Ning, HONG Chen, XIAO Mingming, JING Hongyuan

Computer Engineering and Applications 2025, 61 (6): 361-368. DOI: 10.3778/j.issn.1002-8331.2309-0333

Abstract （35）

PDF（pc）（1308KB）（49）

Save

Aiming at the pre?flight conflict detection and resolution (CDR) problem in unmanned aerial vehicle traffic management (UTM), it is represented as a new version of multi-agent path finding (MAPF) model, a continuous-time hierarchical cooperative A*(CHCA*) algorithm is proposed. Firstly, agents continuously move between positions in the metric space at maximum speed in a continuous search space. Secondly, the size and shape of the agent are considered to determine conflicts based on whether their shapes overlap. Finally, the search heuristic value calculation is optimized. Experiments have shown that the success rate of CHCA* is higher than continuous-time conflict-based search (CCBS) on one-shot path planning, CHCA* is suitable for solving large-scale problems. The simulation experiment on a consultancy study of predicted UAV traffic for delivery services in Sendai, Japan, 2030, shows that for 32?887 random requests in a day, the success rate of CHCA* approaches up to 96%.

Reference | Related Articles | Metrics

Select

Image Depth Estimation Algorithm Incorporating Adaptive Sampling and Context-Aware Module

WANG Guoxiang, LI Changlong, SONG Junfeng, YE Zhen, JIN Heng

Computer Engineering and Applications 2025, 61 (5): 261-268. DOI: 10.3778/j.issn.1002-8331.2310-0301

Abstract （32）

PDF（pc）（1079KB）（50）

Save

Depth estimation aims to predict dense depth maps of the scene from a few sparse depth samples. Existing works directly generate the final depth prediction but not sufficiently exploit the geometric information in sparse depth maps, which results in the prediction accuracy of the depth estimation algorithm not being high enough. To solve this problem, an image depth estimation algorithm incorporating adaptive sampling and context-aware module is proposed to progressively predict depth maps from coarse-level to fine-level. Firstly, a pre-trained depth completion network is introduced to predict coarse-level dense depth maps and obtain rich scene structures and semantic information. Then, the adaptive sampling is designed to guide the model to pay more attention to distant regions which can alleviate the long-tail problem of depth data. Meanwhile, the newly designed context-aware module captures and fuses multi-scale features to obtain more context information of the scene. Experimental results on NYU-Depth-v2 dataset show that the heuristic depth estimation network surpasses compared with methods in several indicators. Results of ablation study demonstrate the effectiveness of the proposed modules. Zero-shot experiments verify the generalization ability of the proposed algorithm, and the accuracy indicator δ<1.25 improves 42 percentage points over P3D and 3.8 percentage points over S2D, respectively.

Reference | Related Articles | Metrics

Select

Strip Surface Defect Detection Algorithm Integrating Res2Net and PConv

HU Kaitao, MA Xianghua, SUN Xiangyu, LIU Chuang

Computer Engineering and Applications 2025, 61 (5): 334-343. DOI: 10.3778/j.issn.1002-8331.2309-0234

Abstract （49）

PDF（pc）（1496KB）（49）

Save

In order to improve the hierarchical feature extraction capability and detection efficiency of strip surface defects, a rapid detection network (MSPC-Net) based on multi-scale representation and partial convolution (PConv) is proposed. Contrast limited adaptive histogram equalization (CLAHE) technology is introduced into the model to highlight the defect characteristics of the strip surface; a new detection layer is added based on YOLOv5s to improve the detection rate of defect targets of different sizes. A multi-scale feature extraction block that integrates Res2Net is designed, and the ECA attention mechanism (BRE-block) is introduced. This block can not only obtain fine-grained features but also increase the model receptive field. By combining with PConv (FLOPs), the amount of model calculation is reduced, and the aggregation of partial feature information is enhanced. Experimental results on the NEU-DET data set show that the average accuracy (mAP@IoU=0.5) reaches 80.2%, which is 5.9 percentage points higher than the original baseline network. At the same time, the FPS of the improved network reaches 157, which is much higher than that of the recently widely used target detection algorithm, effectively improving the detection efficiency of strip surface defects.

Reference | Related Articles | Metrics

Select

Improved RT-DETR Algorithm for Aerial Small Object Detection

LIU Siyuan, GAO Kai, YONG Longquan

Computer Engineering and Applications 2025, 61 (4): 272-281. DOI: 10.3778/j.issn.1002-8331.2407-0399

Abstract （117）

PDF（pc）（1975KB）（116）

Save

Aiming to address the issue of missed and false detection of small objects in aerial photography images by existing object detection algorithms, an improved algorithm based on RT-DETR (real-time detection transformer) is proposed. Partial convolution (PConv) is introduced into the backbone network, and a PConvBlock structure is designed. Then, a BasicBlock-PConvBlock module composed of PConvBlocks replaces the original BasicBlock, effectively reducing the number of model parameters. The bidirectional feature pyramid network (BiFPN) structure is adopted to optimize the feature fusion module. The S2 feature is introduced to enhance the detection ability of small objects. The CARAFE upsampling operator is introduced to strengthen the fast fusion of multi-scale features. Experimental results show that the improved model has a 13.9% reduction in parameter number compared to the RT-DETR model, and the mAP0.5 and mAP0.5：0.95 indicators are improved by 2.4 and 1.9 percentage points, respectively on the VisDrone test set. On the TT100K and DOTA datasets, the improved model outperforms the RT-DETR algorithm. The improved model significantly enhances detection accuracy while maintaining a smaller parameter number and computational cost, meeting the real-time detection application requirements for drone aerial photography images.

Reference | Related Articles | Metrics

Select

Human Pose Estimation with Multi-Scale and Multi-Level Feature Fusion

WANG Yanni, HU Min, HAN Shipeng, CHEN Yixuan, LYU Hao

Computer Engineering and Applications 2025, 61 (6): 199-209. DOI: 10.3778/j.issn.1002-8331.2310-0407

Abstract （44）

PDF（pc）（2007KB）（47）

Save

The accuracy improvement of human pose estimation usually depends on feature fusion. However, the existing feature fusion strategies often ignore the interaction between scale features and level features. The fusion of single mode may result in less significant feature expression. To make full use of the complementarity between different features, a new multi-scale and multi-level feature fusion network (MSLNet) is proposed. The high-resolution network (HRNet) is used as the backbone to exchange information between feature maps of different resolutions through cross-scale information exchange, and to obtain both fine-grained and coarse-grained pose features. The expectation maximization attention bidirectional feature pyramid network (EMA-BiFPN) is introduced to achieve multi-level feature aggregation after multi-scale feature fusion. The details and correlation information of human pose are captured from local to global. A keypoint detection head composed of residual structure is designed to complete the final fusion of output features and improve the accuracy of human keypoint detection. The experimental results show that MSLNet achieves the best accuracy of 75.8% and 91.1% on COCO and MPII datasets, respectively. It is fully verified that MSLNet can make use of the complementarity between scale features and level features to improve the accuracy of human pose estimation.

Reference | Related Articles | Metrics

Select

Computer Engineering and Applications 2025, 61 (5): 0-0.

Abstract （54）

PDF（pc）（717KB）（47）

Save

Most Download articles