Loading...

Table of Content

    2025-12-01, Volume 61 Issue 23
    Research Hotspots and Reviews
    Review of Question Answering Techniques for Knowledge Graph
    QIAN Shenyi, FU Bowen, LI Daiyi, LIANG Yaoyao
    2025, 61(23):  1-23.  DOI: 10.3778/j.issn.1002-8331.2501-0066
    Abstract ( )   PDF (1714KB) ( )  
    References | Related Articles | Metrics
    Intelligent question answering is a key technology to obtain demand information accurately and quickly from massive data. In recent years, intelligent question answering technology has achieved remarkable development, such as problem-based information extraction technology, semantic understanding technology and vector modeling method. However, with the rapid development of intelligent question answering technology, people are eager to have a reasonable division of intelligent question answering model to facilitate the use of users in different fields. In order to divide intelligent question answering model reasonably, it is convenient for researchers in the field of intelligent question answering to conduct in-depth research. Through the investigation of relevant literature in the field of knowledge graph Q&A, this paper summarizes the key technologies of knowledge graph Q&A, including entity linking and knowledge embedding, and introduces the related concepts and processing flow of knowledge graph Q&A in detail. In addition, according to different methods, knowledge graph-oriented question answering techniques are divided into three main categories: semantic parsing, information retrieval and large language model-based methods, and their advantages and disadvantages are introduced and the evaluation indexes of knowledge graph question answering models are summarized respectively. Finally, some suggestions and thoughts are put forward for the existing problems and the future development direction of knowledge graph question-answering technology.
    Survey of Vision Transformers for Fine-Grained Image Classification
    WEN Shixiong, ZHI Min
    2025, 61(23):  24-37.  DOI: 10.3778/j.issn.1002-8331.2503-0014
    Abstract ( )   PDF (2320KB) ( )  
    References | Related Articles | Metrics
    Fine-grained image classification (FGIC) aims to identify subcategories that are visually highly similar yet exhibit subtle differences. With the rapid advancement of deep learning, FGIC algorithms have gradually evolved from traditional fully supervised learning to weakly supervised approaches. Vision Transformers (ViTs), leveraging multi-head self-attention mechanisms, eliminate the reliance on manual annotations and overcome the limitations of convolutional neural networks (CNNs) in terms of receptive field size and global modeling capacity, becoming one of the mainstream methods for this task. This paper first outlines the key characteristics and challenges of FGIC, and briefly introduces the architecture and advantages of ViT. Based on different feature fusion strategies, existing ViT-based improvements are categorized into hierarchical fusion, multi-local fusion, and multi-granularity fusion. The modifications of each category are illustrated in detail, and their underlying mechanisms are systematically analyzed and summarized. In addition, commonly used public datasets are reviewed, and future research directions are proposed based on current limitations, aiming to further explore the potential of ViT in FGIC tasks.
    Review of Unsupervised Learning Methods for Surface Anomaly Localization in Industrial Images
    ZHAO Jun, ZHAO Juanjuan
    2025, 61(23):  38-58.  DOI: 10.3778/j.issn.1002-8331.2503-0133
    Abstract ( )   PDF (5242KB) ( )  
    References | Related Articles | Metrics
    The rapid development of deep learning has marked a milestone for anomaly detection and localization in industrial images. The demand for comprehensive and in-depth exploration of specific methods and emerging trends in this field continues to grow in existing research, surpassing traditional supervised training paradigms. The background, current developments, and key challenges of anomaly localization methods based on self-supervised and unsupervised learning are discussed. The review offers a comprehensive analysis of existing outstanding research in the industrial domain, addressing aspects such as neural network architecture design, special application scenarios, loss function improvements, collection of public datasets, and the use of evaluation metrics. Furthermore, this paper focuses on the role of large visual-language models in few-shot learning for multi-class unified anomaly localization tasks, exploring their cognitive and reasoning capabilities. It summarizes current research findings and highlights future research directions, aiming to enhance the robustness of anomaly localization algorithms in complex real-world scenarios and improve the efficiency of system development. This comprehensive analysis seeks to bridge existing knowledge gaps, provide valuable insights for researchers, and contribute to shaping the future of industrial anomaly localization research.
    Review of Intelligent Analysis Methods for Electrocardiograms in Diagnosis of Arrhythmia and Myocardial Infarction
    HAN Chuang, FAN Baoqi, YU Mengyao, QUE Wenge
    2025, 61(23):  59-71.  DOI: 10.3778/j.issn.1002-8331.2503-0022
    Abstract ( )   PDF (2002KB) ( )  
    References | Related Articles | Metrics
    Electrocardiogram (ECG) remains the gold standard for diagnosing arrhythmia and myocardial infarction (MI), offering advantages such as non-invasiveness, real-time monitoring, and portability, making it widely used in clinical practice. Research on intelligent analysis of ECG for these conditions holds great significance. Firstly, this paper introduces commonly used ECG databases for arrhythmia and MI. Then it reviews recent advances in ECG intelligent analysis over the past three years, including manual feature extraction, convolutional neural networks and their variants, graph neural networks, self-supervised learning, federated learning, active learning, deterministic learning, and generative model. Subsequently, it conducts an in-depth analysis of these methodologies from perspectives including data scale, classification patterns, model comparisons, and model complexity. The study compares the requirements, advantages, disadvantages, interpretability, and application scenarios of different approaches. Finally, it summarizes existing limitations in areas such as data quality and imbalance issues, conflicts between model generalizability and interpretability, trade-offs between privacy protection and collaborative efficiency, mismatches between computational resources and clinical deployment. Feasible solutions are proposed to address these challenges.
    Survey of Fuzz Testing Embedded Device Firmwares
    CHEN Jingjing, WANG Zhengwu, LAN Wenwei, ZHANG Ruichen, ZHANG Yadong, CUI Zhanqi
    2025, 61(23):  72-89.  DOI: 10.3778/j.issn.1002-8331.2503-0296
    Abstract ( )   PDF (908KB) ( )  
    References | Related Articles | Metrics
    To ensure the security of embedded devices, the firmware of embedded devices must be adequately tested to detect and fix the vulnerabilities in time. In recent years, researchers have applied fuzz testing to the testing of embedded device firmwares, effectively improving the efficiency of testing. This paper summarizes research results on fuzz testing of embedded device firmwares from 2014 to 2024, divides the fuzz testing process for embedded device firmwares into three stages: preprocessing, test environment establishment, and fuzz testing execution, then introduces the research results of each stage. In addition, the paper discusses the datasets and evaluation metrics for existing fuzz testing of embedded device firmwares, and looks forward to the future research direction of fuzz testing for embedded device firmwares.
    Theory, Research and Development
    Dual-Population Constrained Multi-Objective Wolf Pack Algorithm with Self-Organizing Mapping Update
    KANG Shuiping, TANG Guangqing, FAN Tanghuai, WANG Hui, LYU Li
    2025, 61(23):  90-109.  DOI: 10.3778/j.issn.1002-8331.2504-0270
    Abstract ( )   PDF (7938KB) ( )  
    References | Related Articles | Metrics
    To address the shortcomings of conventional multi-objective wolf pack algorithms, including their inability to handle constraints, premature convergence caused by population clustering, and outdated update mechanisms leading to loss of high-quality solutions, this paper proposes a self-organizing map updated dual-population constrained multi-objective wolf pack algorithm (CMOWPA-S). The algorithm constructs a dual-population structure where the main population employs constrained dominance principles to ensure operation within feasible regions, while the auxiliary population disregards constraints to enhance solution quality discovery. A dual-optimization hunting strategy is introduced: elite wolves assist the leader in summoning the pack during the raid phase, and Lévy flight strategy is incorporated in the siege phase to improve local optimum escape capability. A self-organizing map-based population update mechanism is designed to extract neighborhood information for generating superior offspring, ensuring the inheritance of high-quality solutions. Environmental selection strategies are implemented to eliminate redundant populations. To verify the performance of the algorithm, it is compared with 4 classic and 5 emerging constrained multi-objective optimization algorithms on 14 simulated constrained multi-objective problems, and with 5 new constrained multi-objective optimization algorithms on 10 real constrained multi-objective problems. The experimental results show that CMOWPA-S can effectively solve constrained objective optimization problems, avoid falling into local optima, and obtain solutions with good population diversity.
    Method for Analyzing Satisfaction with Online Videos Based on Multimodal Emotional Data
    WANG Anqi, LI Mingxuan, CHENG Boxuan
    2025, 61(23):  110-125.  DOI: 10.3778/j.issn.1002-8331.2510-0100
    Abstract ( )   PDF (2527KB) ( )  
    References | Related Articles | Metrics
    With the rapid development of the internet and video platforms, online video content has become increasingly diverse. Effectively evaluating user satisfaction with different types of online videos has emerged as a critical issue in video content promotion and human-computer interaction research. Although multimodal sentiment analysis methods integrating text, audio, and vision information have been widely applied to user emotion recognition, emotional states alone cannot fully reflect users’ comprehensive experience of content. Existing research often remains confined to modeling affect polarity, neglecting the underlying mechanisms linking emotion to satisfaction. This has led to the long-term oversight of satisfaction as a higher-order psychological construct. To more accurately assess users’ holistic emotional responses to online videos, the paper proposes MVSA (multimodal video satisfaction analysis), a video satisfaction analysis framework based on multimodal fusion. Concurrently, the paper establishes MVS-Eval (multimodal video satisfaction evaluation), the first multimodal dataset specifically designed for online video user satisfaction research. This dataset encompasses satisfaction tags across multiple dimensions, including attractiveness, concentration, and engagement. This aims to comprehensively model users’ subjective feedback on video content. Furthermore, the paper proposes the multimodal satisfaction estimation algorithm MUSE (multimodal understanding for satisfaction estimation), based on modality consistency training and satisfaction-guided fusion mechanisms. This effectively establishes the emotion-satisfaction link and enhances the model’s satisfaction metric prediction performance and cross-scenario generalization capability. Additionally, the MVSA framework integrates an intelligent feedback processing platform that automatically parses user feedback videos and generates structured satisfaction evaluation results. Experimental results demonstrate that MUSE significantly outperforms existing mainstream models across multiple benchmark tasks, validating its effectiveness and interpretability in modeling satisfaction for diverse online video types.
    Pattern Recognition and Artificial Intelligence
    Driver Behavior Recognition Method Using Dual-Sequence Pose Integration
    TAN Dayi, TIAN Wei, XIONG Lu
    2025, 61(23):  126-134.  DOI: 10.3778/j.issn.1002-8331.2408-0410
    Abstract ( )   PDF (6144KB) ( )  
    References | Related Articles | Metrics
    Identifying dangerous driving behavior patterns can enhance driving safety and is a crucial aspect of autonomous driving technology research. Currently, image-based driver behavior recognition methods face challenges such as high computational costs and information redundancy. To address these issues, a novel driver behavior recognition method called SimPoseConv3D is proposed, which integrates dual-sequence posture information. Firstly, the SimCC module extracts driver pose heatmap sequences from video. These heatmaps are then stacked, cropped, and sampled along the temporal dimension. Subsequently, the heatmap volumes are fused in both forward and backward directions along the time axis before being input into a 3D CNN to extract spatiotemporal features for behavior recognition. Training and testing on the Drive&Act dataset, along with ablation experiments, show that the proposed method achieves recognition accuracies of 70.25% and 79.04% on Task-level (overall behavior) and Mid-level (fine-grained behavior) test sets, respectively, representing improvements of 6.07 and 4.13 percentage points over the current best public methods. Additionally, using SimCC as the pose estimator enhances computational efficiency by 18.51% compared to traditional pose estimators.
    Auto-Weighted Multi-View Clustering Incorporating Consensus and Diversity
    YAO Yiying, CHEN Mei, WANG Jie, GUO Aixia
    2025, 61(23):  135-148.  DOI: 10.3778/j.issn.1002-8331.2408-0004
    Abstract ( )   PDF (2486KB) ( )  
    References | Related Articles | Metrics
    Multi-view clustering exhibits excellent clustering performance because it can fully integrate information from multiple views. However, most of the existing methods focus only on the consensus information among views, while ignoring the diversity information among views and lacking the accuracy of approximation rank, which ultimately affects the effectiveness of the clustering results. To handle this issue, a multi-view clustering algorithm based on the tensor log-determinant, named auto-weighted multi-view clustering incorporating consensus and diversity is proposed. Specifically, this algorithm first constructs an initial similarity graph for each view, and then uses the tensor log-determinant to maximally approximate the true value of the rank. Subsequently, the algorithm explores intra-view and inter-view diversity information by using a diversity term, and uses an adaptive weighted graph fusion term to extract consensus information from each view. Through iterative optimization, a high-quality fusion graph is finally obtained. Experimental results on eight real-world datasets show that the proposed method significantly outperforms state-of-the-art baselines.
    Negative Pseudo-Label Analysis for Semi-Supervised Action Recognition in Video Transformer
    LUO Deyan, XU Yang, ZUO Fengyun, WANG Minggang
    2025, 61(23):  149-160.  DOI: 10.3778/j.issn.1002-8331.2409-0007
    Abstract ( )   PDF (4713KB) ( )  
    References | Related Articles | Metrics
    Action recognition, as a pattern recognition technique, aims to identify and classify human actions or behaviors by analyzing video or image sequences. Given the exponential increase in video data, semi-supervised learning has been incorporated into action recognition models; however, there is still significant room for improvement in classification performance. The vision Transformer has demonstrated superior performance compared to CNN in image processing, thereby enhancing the training paradigm of video Transformers in semi-supervised learning. Firstly, pre-trained weights are employed to initialize the network, addressing the high training cost associated with Transformer architectures. Secondly, logit standardization preprocessing is introduced to remove the forced matching constraint between student and teacher logits. Finally, negative learning techniques are integrated to dynamically assess model performance and allocate negative pseudo-labels, addressing the issue of inadequate utilization of ambiguous prediction examples. The experimental results demonstrate that the improved semi-supervised video Transformer network achieves superior recognition performance compared to traditional convolutional networks on two widely used video action recognition datasets, UCF-101 and HMDB-51. Specifically, the improved network model outperforms the baseline model on the UCF-101 dataset by 6.4?and 1.5 percentage points at 1% and 10% label rates, respectively. On the HMDB-51 dataset, the improved model shows improvements of 5.2, 3.6, and 3.1 percentage points at 40%, 50%, and 60% label rates, respectively.
    Refined Multi-Scale Feature and Parallel Attention Based Crowd Detection
    ZHANG Xin, KANG Shining, YANG Yuqi, WANG Jun, MA Zhiyuan
    2025, 61(23):  161-172.  DOI: 10.3778/j.issn.1002-8331.2409-0077
    Abstract ( )   PDF (5585KB) ( )  
    References | Related Articles | Metrics
    Crowd detection has wide applications in fields such as autonomous driving, traffic management, and intelligent security. It is characterized by high crowd density, significant pedestrian occlusion, large scale variation, and irregular crowd distribution, which makes it one of the challenging problems in computer vision. To further explore the rich multi-scale information in dense scenes and address the challenges of irregular crowd distribution and shapes, a crowd detection algorithm based on refined multi-scale and parallel attention mechanisms is proposed in this paper, named as RMF R-CNN(refined multiscale feature R-CNN), building upon Sparse R-CNN. Firstly, a receptive field fusion module is proposed using parallel dilated convolutions of different scales to extract refined scale information. Then, a parallel attention module is constructed based on dilated convolution attention and deformable convolution attention to perceive crowd distribution and shape information from different scales. Finally, to mitigate loss sensitivity caused by data mislabeling and pedestrian scale, a dynamic loss weight is added to the original loss function, allowing the loss to dynamically change according to pedestrian scale and prediction accuracy, and enhancing the method’s generalization ability. Experimental results show that the proposed algorithm achieves an AP of 91.1%, an MR?2 of 44.5% and a Recall of 96.7% on datasets such as CrowdHuman and CityPersons. It also shows that the proposed algorithm can improve the performance of crowd detection in dense scenes.
    Multi-Object Tracking Algorithm with Wide-Angle Feature Fusion Memory Network
    ZHANG Beining, TANG Min, LI Hongjun, XIE Zhengguang
    2025, 61(23):  173-180.  DOI: 10.3778/j.issn.1002-8331.2410-0484
    Abstract ( )   PDF (11014KB) ( )  
    References | Related Articles | Metrics
    Multi-object tracking (MOT) in aerial drone videos presents significant challenges due to wide viewing angles, small and distant targets, and rapid target movements, which limit the effectiveness of traditional methods. This paper proposes WideTrack, an innovative Transformer-based MOT method, which employs a wide-angle feature fusion memory network to enhance tracking capability for small, distant targets. To adapt to drone motion, the paper integrates novel track confidence modeling into the filtering process. Additionally, it develops a data association strategy combining a motion feature extraction model and spatially informed WIoU matching, which effectively merges appearance and motion cues to track fast-moving targets. Experimental results on the VisDrone-MOT and UAVDT datasets demonstrate that WideTrack outperforms existing methods, establishing its efficacy and robustness in drone-based MOT tasks. Experimental results show that WideTrack improves the MOTA score by 5.3 percentage points over the best existing model on the VisDrone-MOT dataset. Moreover, the model achieves processing speeds of 16 frames per second on the VisDrone-MOT dataset and 29?frames per second on the UAVDT dataset, demonstrating its effectiveness in drone-based multi-object tracking tasks.
    MSM-AG: Multi-Modal Summarization Model with Image Object Anchor Guidance
    ZHAO Bowen, MA Tinghuai
    2025, 61(23):  181-194.  DOI: 10.3778/j.issn.1002-8331.2409-0110
    Abstract ( )   PDF (3492KB) ( )  
    References | Related Articles | Metrics
    This study focuses on the core semantic analysis of multi-modal input data, aiming to generate text summaries that integrate multi-modal information and select the most relevant images as image summaries to match the text summaries.This field currently faces two major challenges: (1) The challenge of quantifying the semantic correlation between text and images hinders the semantic mining of shared key meanings across modalities. (2) The high redundancy in source modality data, which complicates the precise focus on critical information within the summary. To address these challenges, the proposed model introduces an innovative multi-modal summarization approach guided by image anchor points, named MSM-AG (multi-modal summarization model with image anchor guidance). This model constructs a mechanism for selecting image anchor points, identifies key target anchors within images, and categorizes text and image modality samples into positive and negative classes accordingly. Contrastive learning methods are employed to enhance the distinction between these categories, allowing the model to select image summaries that highly correspond with the text summaries. Extensive experiments conducted on the HCSCL multi-modal news dataset demonstrate that MSM-AG outperforms existing multi-modal summarization models across various evaluation metrics, effectively addressing fundamental challenges in multi-modal summarization.
    Integrating External Knowledge to Enhance Multi-Modal Named Entity Recognition
    MA Yupeng, ZHANG Ming, LI Zhiqiang, GAO Ziling
    2025, 61(23):  195-204.  DOI: 10.3778/j.issn.1002-8331.2409-0116
    Abstract ( )   PDF (2437KB) ( )  
    References | Related Articles | Metrics
    Multi-modal named entity recognition (MNER) aims to use multiple modal information such as text and images to identify predefined types of entities in text. Although existing methods have made some progress, they still face some challenges: (1) It is difficult to establish a unified representation to bridge the gap between different modalities. (2) It is difficult to achieve efficient semantic interaction between different modalities. Therefore, this paper proposes an enhanced multi-modal named entity recognition model that incorporates external knowledge. Firstly, in the modal representation stage, the model introduces the contrastive language-image pre-training (CLIP) model, which uses the text and image prior cross-modal knowledge information contained in the model to enhance the semantic representation of text and images and compensate for the modalities chasm. Secondly, in the modal fusion stage, a cross-modal cross-attention mechanism and a cross-modal gating mechanism are designed to achieve modal information fusion, in order to effectively eliminate noise information in the image and further enhance semantic interaction. Finally, the conditional random field (CRF) is used to realize the recognition of named entities. The F1 values of the proposed method reaches 75.35% and 86.18% on the benchmark datasets Twitter2015 and Twitter2017 respectively, validating the effectiveness of this method.
    Unsupervised Industrial Defect Detection Method Based on Diffusion Model and Knowledge Distillation
    LIU Mingming, SHI Weifeng, FAN Xuehui, ZHANG Haiyan
    2025, 61(23):  205-211.  DOI: 10.3778/j.issn.1002-8331.2506-0028
    Abstract ( )   PDF (1142KB) ( )  
    References | Related Articles | Metrics
    In recent years, industrial defect detection models based on unsupervised learning have achieved significant performance improvements. However, the existing defect synthesis strategies rely on external data sources, resulting in significant differences between the synthesized defects and some real defects, which seriously restricts the generalization performance of the model. Furthermore, the existing reverse distillation methods have the problem of losing feature detail information, resulting in false detection phenomena in the model. To this end, a multi-source defect synthesis strategy is first introduced. The images generated by the diffusion model and the image synthesized from the DTD dataset are more in line with the defect samples of the real defect distribution. Then, the synthetic defect samples are used to fine-tune the representational ability of the teacher network for defects. Subsequently, an anomaly masking module is introduced to address the issue of excessive generalization caused by teacher-student network isomorphism. Finally, a detail repair module is constructed to enhance the student network’s ability to reconstruct the details of the teacher’s features through cross-level feature fusion. Quantitative and qualitative experiments are conducted on the MVTec AD standard dataset. Compared with the benchmark model, the proposed method achieves better performance in terms of both image-level and pixel -level AUROC scores.
    Graphics and Image Processing
    Human Pose Estimation with Semantic Enhancement and Adaptive Multi-Scale Feature Fusion
    ZHANG Jiabo, HE Ajuan, TANG Shangsong
    2025, 61(23):  212-223.  DOI: 10.3778/j.issn.1002-8331.2407-0177
    Abstract ( )   PDF (1752KB) ( )  
    References | Related Articles | Metrics
    Due to the small scale and sensitive location of keypoints, how to effectively extract spatial and semantic information has always been the main challenge of pose estimation task. In order to solve this problem, this paper proposes a semantic-enhanced and adaptive multi-scale feature fusion network (SAMFFNet) for human pose estimation. SAMFFNet utilizes the lightweight MobileNetV2 as the backbone network to build the feature pyramid, and uses EfficientViT to generate scale-aware global semantics. In the designed deep semantic injection module, the content-guided attention is used to fuse global semantics with local features to enhance the semantic representation of key points. Furthermore, an adaptive multi-scale feature fusion module is proposed, which can dynamically adjust the large spatial receptive field according to the input features and enhance the information interaction between features at different scales by integrating the large selective convolution kernel module (LSK) and the cross-layer interaction mechanism. The experimental results show that on the COCO validation set, SAMFFNet has improved its accuracy index by 6.1 percentage points compared to the backbone network, reaching 70.7%. Although its accuracy is slightly lower than that of the larger model SimpleBaseline, it has reduced the number of parameters by 85.0% and the computational complexity by 78.3%. On the MPII dataset, an accuracy improvement of 2.3 percentage points is also achieved compared to the backbone network. The comprehensive performance on the COCO and MPII datasets fully confirms the effectiveness of SAMFFNet in enhancing human structural features and feature fusion strategies.
    Small Object Detection Network Based on Step-by-Step Adaptively Feature Fusion Module
    CHEN Peng, LIN Bin, BAI Yong, HUANG Weilun
    2025, 61(23):  224-232.  DOI: 10.3778/j.issn.1002-8331.2409-0081
    Abstract ( )   PDF (2530KB) ( )  
    References | Related Articles | Metrics
    Small object detection plays a role in tasks such as driving assistance, smart healthcare, and drone inspections. Multi-scale feature learning is a commonly adopted strategy in designing small object detection networks. The classic feature pyramid structure achieves multiscale information transmission by integrating feature maps from different levels, thereby capturing key information about small objects across feature maps of varying resolutions. However, when fusing feature maps at different scales, semantic information conflicts often arise, leading to inconsistent gradient computations and causing the information of small objects to be overwhelmed. Therefore, a step-by-step adaptive feature fusion module (SAFF) is proposed, which divides the feature fusion process into three sequential stages. By progressively fusing adjacent scale feature maps, it resolves the issue of semantic conflict during the fusion process. Additionally, within each stage, adaptive feature fusion can alleviate the problem of inconsistent gradient calculations. The SAFF module is applied to general object detection networks to form the SAFF-RCNN and Cascade-SAFF-RCNN networks dedicated to small object detection. Experimental results show that the proposed networks achieve significant improvements in small object detection performance, reaching or surpassing other mainstream small object detection models, thus demonstrating the effectiveness of the proposed SAFF module in small object detection.
    Lightweight and Synergistically Enhanced YOLOv8n Model for Traffic Sign Detection
    FANG Tianrui, CHENG Guang, LIU Hailin, TANG Shaohu
    2025, 61(23):  233-247.  DOI: 10.3778/j.issn.1002-8331.2507-0109
    Abstract ( )   PDF (3104KB) ( )  
    References | Related Articles | Metrics
    A lightweight object detection model, RACP-YOLO (reconstruction-aware compressed prediction YOLO), is proposed to address challenges in traffic sign detection, including missed detection of small targets, interference from complex backgrounds, and excessive model complexity. The backbone integrates a compact C2F-RVB module to improve low-level semantic representation and employs an ADown module for multi-scale downsampling, effectively balancing resolution and receptive field to enhance object perception. A channel?aware attention (CAA) mechanism is used to strengthen inter-channel dependencies and saliency response.The core improvement lies in the proposed SCConv detection head, composed of a spatial reconstruction unit (SRU) and channel reconstruction unit (CRU) in a dual-branch design. Combined with an additional P2 branch, the resulting SCHead enhances spatial modeling for small-scale and local targets. Experimental results on the TT100K dataset demonstrate that RACP-YOLO achieves a mAP0.5 of 0.685, surpassing YOLOv8n by 2.1%. The number of parameters is reduced from 3.01×106 to 1.12×106 (a reduction of 62.8%), and computational cost drops from 8.1×109 to 4.3×109 (a reduction of approximately 46.9%). Furthermore, generalization experiments on the CCSTB dataset confirm that the proposed model maintains stable detection performance and strong adaptability in complex scenarios, such as nighttime, strong light, and rainy conditions. This improvement enables higher detection accuracy while significantly enhancing model compactness and deployment efficiency, making it well-suited for real-time applications in in-vehicle and edge scenarios.
    LGM-YOLOv11: Underwater Object Detection Model Fusing Multi-Scale Attention Mechanism
    CHEN Hui, YU Yongjie
    2025, 61(23):  248-263.  DOI: 10.3778/j.issn.1002-8331.2506-0362
    Abstract ( )   PDF (2682KB) ( )  
    References | Related Articles | Metrics
    Underwater images play a crucial role in applications such as marine ecological environment monitoring and underwater resource development. However, underwater images are often affected by factors such as light scattering, suspended particles, and color attenuation, resulting in low contrast, blurred edges, and noise interference, which in turn reduces the accuracy and efficiency of underwater target detection. To address these challenges, a waterborne target detection model integrating a multi-scale attention mechanism is proposed to enhance the detection performance of underwater objects. Firstly, the Laplacian-of-Gaussian stem (LoGStem) is introduced to replace the first two convolutional layers of the YOLOv11 backbone network, enhancing the extraction ability of edge and texture details in underwater images. Secondly, the gated activation convolution module (GSConv) is proposed and embedded in the feature pyramid network, using the gating mechanism to enable dynamic features for each spatial position and channel, thereby enhancing the model’s ability to capture details. Then, the multi-scale enhanced parallel attention module (MSEPA) is proposed and integrated into C3k2, and through the collaborative effect of multi-scale feature fusion and multiple attention mechanisms, the receptive field is enlarged and the feature representation is enhanced. Finally, to improve the accuracy and stability of small target localization, the Shape-NWD loss function is used. Experiments on the UTDAC, DUO, RUOD and underwater garbage datasets show that the proposed method achieves the best detection accuracy compared with the contrast models.
    Detection Method of Railway Perimeter Intrusion Combined with Compact Features and Attention
    WANG Hui, LI Zelong, YE Jiangang, TANG Xiaokun, XU Feng
    2025, 61(23):  264-273.  DOI: 10.3778/j.issn.1002-8331.2411-0288
    Abstract ( )   PDF (9595KB) ( )  
    References | Related Articles | Metrics
    To address the issue of perimeter intrusions impacting train safety in railway environments, and to overcome the limitations of low accuracy and efficiency in existing methods, a perimeter intrusion foreign object detection approach is proposed based on the YOLOv9 model. The proposed feature aggregation module reduces the network’s computational complexity by employing a compact architecture, thereby enhancing detection efficiency. A multi-channel attention mechanism with inverted residual is proposed by integrating the transposed residual structure with the designed multi-channel attention. This approach reduces the number of convolutional parameters, promotes extensive interaction of information across channels, captures key features of the detection target, enhances anomaly detection accuracy, and minimizes both false negatives and false positives. The modified auxiliary detection branch effectively extracts image feature information while reducing the model’s parameter size. Experimental results demonstrate that the proposed model achieves an mAP@0.5 of 93.5% and a recall rate of 89.2% on the railway perimeter foreign object dataset, outperforming the YOLOv9 model by 6.1 and 4.6 percentage points, respectively, while reducing the model’s parameter count by 54.5%. Compared to other mainstream models, the proposed model achieves superior performance across key evaluation metrics, including mAP@0.5, recall rate, false positive rate, and false negative rate. In summary, the proposed model outperforms other mainstream models and demonstrates strong performance in perimeter intrusion detection tasks.
    Network, Communication and Security
    Traceable Multi-Authority Dynamic Searchable Encryption Scheme in Fog Computing
    LIU Xueyan, LI Wenjing, JIA Bolong, XU Wenhao
    2025, 61(23):  274-285.  DOI: 10.3778/j.issn.1002-8331.2409-0356
    Abstract ( )   PDF (1309KB) ( )  
    References | Related Articles | Metrics
    This paper focuses on the limitations of traditional attribute-based keyword search schemes, such as inefficiencies in search algorithms, single-point performance bottlenecks, and the risk of malicious key leakage. A multi-authority encryption scheme with dynamic keyword search and user traceability is proposed for fog computing environments. Firstly, a dynamic search mechanism is introduced, allowing the search to stop once any keyword in the set matches, improving both flexibility and usability. Secondly, through the introduction of multiple attribute authorities to manage disjoint attribute sets, single-point bottlenecks are prevented, significantly enhancing scalability and system stability. Thirdly, in order to prevent malicious users from leaking keys, the user’s identity information is embedded in the key to track malicious users. After tracking malicious users, the user would be added to the revocation list, thereby canceling the user’s access rights. Finally, computational tasks are outsourced to fog nodes, reducing the computational burden on resource-constrained terminal users. Security analysis and performance comparisons show the proposed method is secure and efficient.
    Fail-Stop Attribute-Based Group Signature Scheme
    LIAO Dongxu, CHENG Xiaogang
    2025, 61(23):  286-296.  DOI: 10.3778/j.issn.1002-8331.2409-0162
    Abstract ( )   PDF (776KB) ( )  
    References | Related Articles | Metrics
    Attribute-based group signatures play a crucial role in privacy protection and fine-grained signing. However, existing schemes fail to adequately address scenarios involving adversaries with unlimited computational power and attribute revocation, which results in insufficient security and practicality. This paper proposes a fail-stop attribute-based group signature scheme (FSABGS) by combining it with fail-stop signatures to overcome these limitations. This scheme enables signers with limited computational resources to detect attacks from adversaries with superior computational capabilities. The security of this mechanism is based on information-theoretic principles and does not rely on any computational hardness assumptions. To enhance the scheme’s flexibility, dynamic accumulators are employed to implement identity and attribute revocation through the dynamic accumulation of certificates. This mechanism offers advantages in terms of computational overhead. Furthermore, the signature size generated by the scheme remains constant, and the computational cost is independent of the revocation list. The scheme ensures member anonymity and attribute anonymity, and its security is provable under the random oracle model. Through analysis and experimental comparison with similar schemes, the results demonstrate that the proposed scheme incurs lower overhead and is more practical.
    Engineering and Applications
    Object Detection Within 3D Point Cloud via 2D Convolution Neural Network
    LI Xiaoli, WANG Le, DU Zhenlong, CHEN Dong
    2025, 61(23):  297-304.  DOI: 10.3778/j.issn.1002-8331.2409-0082
    Abstract ( )   PDF (3009KB) ( )  
    References | Related Articles | Metrics
    Lidar has been initially applied in autonomous driving and industrial automation, generating vast amounts of point cloud data for scenes and objects. These point cloud data are characterized by high dimensionality and irregularity, and require computationally expensive 3D convolution in existing deep learning models, leading to high spatio-temporal complexity and hindering online application. Addressing the limitations of traditional network models in processing point cloud data, this paper proposes a 3D point cloud object recognition method based on 2D convolutional neural networks. The proposed method statistically regularizes irregular point cloud data into pillars, utilizes convolutions and pooling to extract features from clusters of pillars, converts the 3D point cloud data into 2D image-like features, and employs 2D convolutional neural networks to extract multi-scale latent features from multiple receptive fields. The decoder network then identifies objects within point cloud based on locations, orientations, and object types. Experiments are conducted on Ascend Atlas 200DK edge devices, achieving a single inference time of 291?ms. Compared with traditional point cloud object detection networks, the proposed method outperforms VoxelNet, F-PointNet, and Second by 14.7, 13.2, and 3.4 times, respectively, in terms of performance gains. On the KITTI dataset, the average precision exceeds that of the second-best algorithm by more than 2.3%  compared with 14 other point cloud object detection algorithms, including ContFuse. Ablation studies focusing on 2D convolutions and attention mechanisms reveal improvements of 50.9% and 5.37%, respectively, in model size and inference accuracy. The experimental results demonstrate that the proposed method can efficiently, robustly, and accurately detect objects within point cloud data.
    Method for Foreign Object Detection on Transmission Lines Using Monocular Depth Estimation
    HU Guangyi, HAN Jun, NI Yuansong, WANG Wenshuai, CHEN Keyu
    2025, 61(23):  305-315.  DOI: 10.3778/j.issn.1002-8331.2407-0097
    Abstract ( )   PDF (3188KB) ( )  
    References | Related Articles | Metrics
    To address background false positives and object misses in detecting foreign objects on transmission lines, a method using monocular depth estimation is proposed. The multi?level feature fusion depth estimation network (MFFDepth) integrates semantic information from multiple feature levels within the encoder and introduces a coordinate attention module in the skip connections between the encoder and decoder, enhancing global depth perception in complex scenes. Based on the predicted depth map, depth value clustering is used to obtain the foreground image and foreground depth threshold. The YOLOX object detection network, combined with the foreground depth threshold, excludes background false positives. The DeepLabv3+ semantic segmentation network, combined with the depth foreground image, addresses the issue of foreign object detection omission. Finally, the results from these two combined detection modules are fused to improve overall detection performance. Experimental results show that the proposed method achieves an accuracy of 92.9% and a recall rate of 95.8%, which are improvements of 1.4% and 8.3%, respectively, compared to the original YOLOX algorithm, effectively enhancing the detection of foreign objects on transmission lines.
    Reinforcement Genetic Algorithm for Path Planning of UAV Power Inspection with Nest Charging
    LIANG Chenlei, LUO He, JIANG Ruhao, YIN Youlong, LIN Shizhong, WANG Guoqiang
    2025, 61(23):  316-328.  DOI: 10.3778/j.issn.1002-8331.2407-0323
    Abstract ( )   PDF (1362KB) ( )  
    References | Related Articles | Metrics
    Aiming at the path planning problem of UAV power inspection with the machine nest as the charging station, a mathematical model is constructed to minimize the total time of UAV task execution, and a reinforcement genetic algorithm is designed to solve the problem. In this algorithm, a population initialization operator based on greed and a feasible solution generation operator based on split are proposed, and the parameter tuning process of genetic algorithm is modeled as a Markov decision process, and a dynamic tuning strategy of cross probability and mutation probability is designed based on double Q-learning. In numerical experiments, the results of comparison with Gurobi solver, classical genetic algorithm, genetic algorithm based on elite retention and differential evolution algorithm show that the algorithm has significant advantages in solving quality and solving speed. At the same time, in the case analysis, the comparison with the existing inspection strategy further verifies the application effect of the algorithm in the actual scene.
    Real-Time Dynamic Visual-Inertial SLAM Algorithm Integrated with Gaussian Mixture Filtering
    WANG Yudong, WU Helei, XU Xuesong
    2025, 61(23):  329-339.  DOI: 10.3778/j.issn.1002-8331.2408-0379
    Abstract ( )   PDF (12747KB) ( )  
    References | Related Articles | Metrics
    Regarding simultaneous localization and mapping (SLAM) with poor robustness, low accuracy and weak real-time performance in dynamic environment, a visual inertial SLAM algorithm integrating Gaussian mixture filtering is proposed. Initially, the visual feature point displacement is calculated by designed spatial sifter from the visual points coordinate and the camera prior rotation estimated by error state Kalman filter (ESKF) used on inertial measurement unit (IMU). Subsequently, the Gaussian distribution of feature points is sieved to obtain the initial expectation and its variance, and the Gaussian mixture model is introduced to optimize each Gaussian distribution and generate the corresponding feature point clusters. Afterwards, the optimal static cluster filtering strategy is proposed to obtain the stable static feature point cluster and estimate the accurate camera pose. The experimental results based on TUM-RGBD and VCU-RUI dynamic landmark dataset show that the proposed method has better performance than VINS-Mono and its improvement in most dataset. It achieves an average enhancement of 92% in root mean square error of absolute trajectory error compared to VINS-Mono, meets real-time requirements, and has reference and potential applications for SLAM research and autonomous robot navigation.
    Research on Joint Distribution Optimization of Multi-Center and Multi-Vehicle Electric Trucks Under Time-Varying Road Network
    GUO Jiawei, HUANG Zhipeng, JIA Jinxiu, MA Xiaotian, LI Jianguo, YE Binbin
    2025, 61(23):  340-350.  DOI: 10.3778/j.issn.1002-8331.2408-0441
    Abstract ( )   PDF (1603KB) ( )  
    References | Related Articles | Metrics
    In the transformation of the logistics industry from high carbon emissions to green and low carbon, electric trucks are favored in the field of logistics distribution. However, considering the uneven temporal and spatial distribution of traffic impedance in urban road networks and the nonlinear characteristics of battery charging, traditional static vehicle routing optimization is difficult to meet actual needs. In order to improve the delivery efficiency of electric trucks in time-varying road networks, a mixed integer programming model with the goal of minimizing the comprehensive delivery cost is constructed by comprehensively considering factors such as multi-center and multi-model joint distribution strategy, partial charging strategy based on nonlinear charging function, time window, load and service time window. An improved K-means clustering method and a simulated annealing algorithm with memory function are designed to solve the model. Taking some logistics parks in Shanghai as an example to verify the effectiveness of the model and algorithm, the results show that the distribution cost difference between peak and non-peak hours is about 5.7%. Compared with the single vehicle distribution scheme, the cost of the multi-vehicle joint distribution scheme is reduced by about 5.4%. The cost of the partial charging strategy is about 5.4% lower than that of the full charging strategy. The research results provide a reference for logistics enterprises to further optimize the distribution scheme of electric trucks under urban time-varying road network.
    School of Computer and Electronic Information, Guangxi University, Nanning 530004, China
    LI Zhijun, CHEN Qiulian
    2025, 61(23):  351-359.  DOI: 10.3778/j.issn.1002-8331.2408-0453
    Abstract ( )   PDF (1403KB) ( )  
    References | Related Articles | Metrics
    In view of the disadvantages of heuristic intelligent algorithm bat algorithm, which is easy to fall into local optimization and insufficient optimization ability, an improved discrete bat algorithm is proposed with the goal of minimizing the maximum completion time. Firstly, the paper combines the selection of local minimum time machines and random selection machines to initialize the population, improving the quality and diversity of the initial population. Secondly, from the perspective of process arrangement and machine selection, selection, superposition, crossover operators, and forward and reverse learning operations are designed to improve the position update mechanism, and use six neighborhood structure operations based on operation arrangement and machine selection to optimize the variable neighborhood search strategy and enhance the algorithm’s ability for global and local search. Finally, the experimental simulation results of benchmark examples show that the improved discrete bat algorithm has better optimal performance.
    Fault Diagnosis Method for Fixed-Wing UAVs Integrating Attention Mechanism and Meta-Learning
    DONG Qianli, ZHANG Ansi+, WU Jie, ZHAO Kaijun
    2025, 61(23):  360-367.  DOI: 10.3778/j.issn.1002-8331.2409-0051
    Abstract ( )   PDF (1600KB) ( )  
    References | Related Articles | Metrics
    With the increasing application of UAVs in various fields, fault diagnosis has become crucial for ensuring their safe operation. However, traditional deep learning-based fault diagnosis methods often rely on large amounts of labeled data, leading to issues such as poor generalization performance, insufficient extraction of key features, and overfitting, especially in scenarios with small sample sizes and complex flight environments. To address these challenges, a meta-learning and effective channel attention(MLECA) fault diagnosis method is proposed. This method aims to improve the accuracy and robustness of fault diagnosis through meta-learning. Firstly, the original sensor data are preprocessed, and meta-tasks are constructed. Secondly, to effectively capture and emphasize important features, a feature encoder combining convolutional neural networks and efficient channel attention (ECA) is established. Finally, it is used as the base model, and model-agnostic meta-learning is applied to train and optimize the initialization parameters to acquire prior representational knowledge, which is then used for fixed-wing UAV fault diagnosis in unknown environments. Experimental results demonstrate that the MLECA method exhibits better overall diagnostic performance and stronger generalization capability.
    Instant Delivery Problem with Constrained Capacity Considering Urban Traffic Asymmetric Network
    WU Tengyu, XUE Huanhuan, FU Deqiang, YU Haiyan
    2025, 61(23):  368-376.  DOI: 10.3778/j.issn.1002-8331.2410-0277
    Abstract ( )   PDF (939KB) ( )  
    References | Related Articles | Metrics
    With the increasing scale and range of deliveries, riders have frequent traffic accidents. The complexity of the urban traffic network and the deviation of order capacity from the platform prediction force the riders to adopt non-standard loading methods such as mounted handlebars during peak hours, which significantly increases the risk of traffic accidents. Therefore, it is essential to consider capacity-constrained pickup and delivery strategies. So a real-time delivery route optimization problem with capacity constraints considering the characteristics of urban transportation networks is proposed. Firstly, the lower bound of the problem is demonstrated. Double judgment condition (DJC), judge path and load weighted (JPL) and wait and serve (W&S) strategies for specific and general networks are designed. And worst-case scenario analysis is used to prove the competitive ratios of these strategies. Finally, through case studies and analysis of the performance of the JPL and W&S strategies under different order densities, maximum asymmetry coefficients and order capacity demand ratios, the algorithms’ effectiveness is validated. The results indicate that the JPL strategy is highly applicable and performs best in urban traffic networks with higher order density, more large-capacity orders and smaller asymmetric coefficients. The W&S strategy is more suitable for asymmetric urban transportation networks with lower order density and significant capacity demand. The conclusion of the study provides a delivery strategy considering capacity constraints in different cases, and reduces the non-standard loading demand through real-time optimization of the path, ensuring the safe delivery of riders.