Loading...

Table of Content

    2024-12-15, Volume 60 Issue 24
    Research Hotspots and Reviews
    Overview of Causal Learning Techniques and Applications
    LONG Xiangfu, LI Shaobo, ZHANG Yizong, YANG Lei, LI Chuanjiang
    2024, 60(24):  1-19.  DOI: 10.3778/j.issn.1002-8331.2405-0407
    Abstract ( )   PDF (6887KB) ( )  
    References | Related Articles | Metrics
    Machine learning is the core of artificial intelligence and data science, and is widely used in education, transportation and manufacturing. With the development of machine learning and the extension of application fields, the models have revealed some problems to be solved in terms of interpretability and fairness. Causal learning (CL), as a method combining causality and machine learning techniques, can enhance the interpretability of the model and solve the problems of fairness, and its research has gradually become a hot spot in the academic world. Therefore, based on the introduction of the relevant theoretical knowledge of CL, the techniques of causal explanation, causal supervised learning, causal fairness, and causal reinforcement learning are firstly analyzed and outlined in an all-round way according to the problems that can be solved by CL. Secondly, the applications of CL in the fields of medicine, agriculture and intelligent manufacturing are summarized from multiple perspectives. Finally, some open problems and challenges of CL are summarized, and future research directions are given, aiming to promote the continuous development of CL.
    Functional Maps and Its Application in Non-Rigid 3D Shape Correspondence
    WANG Ning, ZHANG Dan, XU Chenhao, SONG Meihua, ZHANG Jianpeng, PENG Quanhong
    2024, 60(24):  20-43.  DOI: 10.3778/j.issn.1002-8331.2403-0405
    Abstract ( )   PDF (9107KB) ( )  
    References | Related Articles | Metrics
    With the continuous development of 3D shape research technology, the issue of non-rigid 3D shape correspondence becomes increasingly important, with applications spanning multiple fields such as computer graphics, computer vision, and pattern recognition. The functional maps framework has achieved advanced results in non-rigid 3D shape correspondence, as it can capture complex relationships between shapes and exhibit robustness to the topological noise of non-rigid shapes. This paper first outlines the fundamental concepts and research directions of 3D shape correspondence, then it elaborates on the basic framework of functional maps. Building on this, it systematically reviews the classic works in the related field, including traditional functional maps methods and deep functional maps methods, highlighting the advantages and limitations of different approaches in addressing non-rigid shape correspondence. Subsequently, the paper introduces commonly used datasets in the field of non-rigid 3D shape correspondence, and conducts experimental comparisons and analysis of different methods. Finally, it provides an outlook on the future development trends of non-rigid 3D shape correspondence.
    Comparative Review of Text-to-Image Generation Techniques Based on Diffusion Models
    GAO Xinyu, DU Fang, SONG Lijuan
    2024, 60(24):  44-64.  DOI: 10.3778/j.issn.1002-8331.2405-0048
    Abstract ( )   PDF (809KB) ( )  
    References | Related Articles | Metrics
    With the continuous development of deep learning, artificial intelligence generated content has become a hot topic, especially diffusion models, as an emerging generation model, have made significant progress in the field of text-to-image generation. This article comprehensively describes the application of diffusion models in text and image generation tasks, and compares them with generative adversarial networks and autoregressive models, revealing the advantages and limitations of diffusion models. Meanwhile, it delves into the specific methods of diffusion models in improving image quality, optimizing model efficiency and generating images from multilingual text prompts. Experimental analyses on CUB, COCO and T2I-CompBench datasets not only validates the zero-shot generation capability of diffusion models but also highlights their ability to generate high-quality images based on complex text prompts. The paper introduces the promising applications of diffusion models in fields such as text-guided image editing, 3D generation, video generation, and medical image generation. It summarizes the challenges faced by diffusion models in text-to-image generation tasks and their future development trends, aiming to facilitate further research in this domain.
    Review of Joint Extraction of Medical Entities and Relationships Based on Deep Learning
    YE Qing, ZHANG Xiaofeng, PENG Lin, CHENG Chunlei
    2024, 60(24):  65-78.  DOI: 10.3778/j.issn.1002-8331.2403-0457
    Abstract ( )   PDF (5069KB) ( )  
    References | Related Articles | Metrics
    Named entity recognition and relationship extraction are core tasks in the field of medical information extraction. They enable the automatic identification of entities, entity types, and relationships between entities from unstructured or semi-structured text. This capability not only facilitates the discovery and integration of knowledge, application in clinical decision-making, and enhancement of drug discovery and repurposing, but also supports public health monitoring and disease prevention. This article begins by reviewing the development of entity recognition and relationship extraction, introducing common evaluation metrics and datasets for joint entity and relationship extraction in the medical field. It highlights current challenges in the field, such as the complexity of medical text structures and the low accuracy of joint extraction. Building on these issues, the article further explores the application of deep learning-based methods for joint entity and relationship extraction in the medical field. These methods are primarily categorized into joint extraction models based on shared parameters and those based on joint decoding. The article discusses and summarizes the advantages and disadvantages of different models from a problem-solving perspective. Finally, the article discusses the challenges in joint entity and relationship extraction within the medical field and suggests future research directions.
    Review of Research on DDoS Attack Detection in SDN
    ZHENG Chengwei, WANG Haifeng, LIU Rui
    2024, 60(24):  79-96.  DOI: 10.3778/j.issn.1002-8331.2405-0367
    Abstract ( )   PDF (6106KB) ( )  
    References | Related Articles | Metrics
    The emergence of software defined networking (SDN) makes up for the shortcomings of traditional networks and brings technological innovation to network management. Distributed denial-of-service (DDoS) attacks, as one of the major threats in the field of network security, seriously affect the emerging network architecture of SDN. With the deployment and development of SDN technology, how to detect DDoS attacks in SDN has become a hot and difficult point in the current research field. In order to provide a reasonable overview of related research work, DDoS attack detection methods are divided into three categories:information entropy-based, machine learning-based, and deep learning-based, according to the different core technologies or theories used. This paper introduces the SDN architecture and analyzes DDoS attacks within SDN, along with presenting some commonly used public datasets and evaluation indicators, then it summarizes and analyzes models or algorithms proposed by relevant researchers in recent years from four perspectives, and finally, it summarizes the future research directions and prospects in the field of DDoS attack detection in SDN, provide research ideas for relevant researchers in this field.
    Theory, Research and Development
    Improved Dung Beetle Optimization Algorithm by Hybrid Multi-Strategy
    LOU Gewei, ZHENG Yonghuang, CHEN Jun, SHEN Tingzheng, SUO Xiangbo, LIU Xuliang
    2024, 60(24):  97-109.  DOI: 10.3778/j.issn.1002-8331.2405-0187
    Abstract ( )   PDF (6974KB) ( )  
    References | Related Articles | Metrics
    An improved dung beetle optimization algorithm using hybrid multi-strategy is proposed, to make up for the shortcomings of the original dung beetle optimization algorithm, such as insufficient of global exploration ability, being easy to fall into local optimization and unsatisfactory convergence accuracy, etc. The chaotic mapping and random opposition-based learning are used to initialize the population to improve the diversity, expand the search range of the solution space, and enhance the global optimization ability. The golden sine strategy is applied to facilitate individual dynamic search and enhance the ergodicity of algorithm. The introduction of competitive mechanism enhances information exchange, balances global exploration with local development, and accelerates the convergence speed of algorithm. In the late iterations, the adaptive [t]-distribution mutation is introduced to provide perturbation and avoid falling into local optimization. The proposed algorithm is compared with other optimization algorithms by 23 benchmark test functions. The results show that the improved algorithm has stronger optimization performance, higher convergence accuracy and better stability. The application of the proposed algorithm in engineering design examples demonstrate its effectiveness in dealing with real optimization problems.
    Qubit Mapping Algorithm for Noisy Intermediate-Scale Quantum Computers
    HUANG Hongkai, ZHANG Xuesong
    2024, 60(24):  110-118.  DOI: 10.3778/j.issn.1002-8331.2404-0196
    Abstract ( )   PDF (3606KB) ( )  
    References | Related Articles | Metrics
    Due to constrains of quantum hardware, the physical qubit pairs capable of implementing two-qubit gates are limited. Most quantum algorithms require to insert additional quantum gates to be executed on NISQ (noisy intermediate-scale quantum) computers by altering the mapping relationship between logical qubits and physical qubits. To improve the quality of initial mapping of quantum circuits, and reduce algorithm complexity and execution time, a SWAP-based optimization bidirectional heuristic search algorithm is proposed. The algorithm utilizes a nearest neighbor strategy to filter out a candidate queue of SWAP gates. To reduce the search space and the number of additional gates, the algorithm evaluates candidates of SWAP gates by enhancing the heuristic cost function. Considering the characteristics of the quantum circuit structure, this algorithm uses a reverse traversal method to enhance mapping quality with updating initial mapping strategy. Moreover, this algorithm is applicable to hardware devices with arbitrary coupling of qubits. Experimental results demonstrate that compared to mainstream IBM_QX, SPBHA and SAHA algorithms, this algorithm reduces the number of additional gates by approximately 73%, 28% and 20%, respectively, and decreases execution time by around 300%, 80% and 19%, improving the efficiency of quantum circuit mapping.
    Pattern Recognition and Artificial Intelligence
    Automatic Annotation of Knowledge Points in Picture-Based Educational Resources for Knowledge Scenarios
    WANG Jing, DU Xu, LI Hao, HU Zhuang
    2024, 60(24):  119-130.  DOI: 10.3778/j.issn.1002-8331.2308-0102
    Abstract ( )   PDF (5856KB) ( )  
    References | Related Articles | Metrics
    Aiming at the challenge of inconsistency between the visual features of picture resources and the semantics of advanced knowledge, a new automatic annotation algorithm for knowledge points is  proposed, called the situational hypergraph convolutional network based on knowledge scenarios (SHGCN), which can efficiently organize and manage picture data, promote knowledge understanding and utilization, and improve education intelligence. The algorithm not only extracts explicit visual features of the picture resources, but also mines knowledge information hidden in fine-grained regions. Faster R-CNN and OCR techniques are utilized to identify knowledge entities such as knowledge objects and coordinate texts, and multi-granularity features are fused to generate knowledge vectors. Then, a dual-screening mechanism is proposed to construct different types of knowledge scenarios, and the knowledge scenarios are used as hyperedges to construct a situational hypergraph to model higher-order knowledge correlations between images containing similar knowledge information. Finally, the hypergraph convolution is used to complete the information aggregation of knowledge-similar pictures, and realize the transformation from “vision-semantic” to “vision-semantic-knowledge”. This paper also constructs a physical picture dataset to train and validate SHGCN. Experimental results show that SHGCN outperforms current state-of-the-art methods by fusing explicit visual features and implicit knowledge information of pictures.
    Hash Embedding for Attributed Multiplex Heterogeneous Network
    SU Huimin, LI Qian, GUO Hongyu, LIU Yulong
    2024, 60(24):  131-139.  DOI: 10.3778/j.issn.1002-8331.2406-0061
    Abstract ( )   PDF (4306KB) ( )  
    References | Related Articles | Metrics
    Heterogeneous networks have been widely utilized in many fields. However, existing network embedding methods often meet challenges when dealing with heterogeneous networks. One of them is the underutilization of node attribute information, resulting in a lack of representational power. Another challenge is the complexity of the network structure, which makes existing representations often unable to capture important features of the network, thus affecting the effect of downstream tasks. The hash embedding for attributed multiplex heterogeneous network (AMHEN) aims to solve the above problems. By integrating the attributes of nodes and the network structure information into the node embedding, the method uses the deep hash layer to learn the compact representation of nodes, so as to obtain the hash embedding. Compared with the traditional embedding method, the proposed method can better retain the important attributes of nodes, and compress the node representation into fixed length binary coding by hash technology, which improves the efficiency and scalability of the embedding. Sufficient experimental results show that the proposed hash embedding for AMHEN can significantly reduce the embedding dimension while maintaining the embedding quality, thus providing a more efficient network embedding for subsequent downstream tasks.
    Graph Autoencoder Framework Combining Path Masking and Dual Decoder
    ZHAO Shaohui, MA Xiao, WANG Jianxia
    2024, 60(24):  140-148.  DOI: 10.3778/j.issn.1002-8331.2308-0124
    Abstract ( )   PDF (4936KB) ( )  
    References | Related Articles | Metrics
    Graph autoencoder, as a self-supervised learning method, has been widely applied in the field of graph neural networks. However, recent studies have shown that existing graph autoencoders often reconstruct the entire graph structure, leading to overfitting issues. Additionally, these methods tend to overemphasize neighbor information while neglecting structural information, resulting in poor performance on node classification tasks. To address these issues, a graph autoencoder framework based on path masking and dual decoders is proposed for graph representation learning. Firstly, the input graph is perturbed using path masking to avoid generating overfitting data. Secondly, a graph neural network is employed as the encoder to perform message passing on the remaining graph structure, enhancing the learning capability for graph data. Finally, dual decoders are introduced to reconstruct the masked edges, capturing both neighbor information and structural information. The model is evaluated on five publicly available graph datasets and compared with state-of-the-art graph representation learning methods. Experimental results demonstrate that the proposed approach achieves similar or better performance on all five datasets and outperforms baseline methods in link prediction and node classification tasks.
    MBRNet:Multi-Branch Handwritten Character Recognition Network with Integrated Residual Connection
    LI Gang, CHEN Taibing, YANG Zhibo, FAN Yi, ZHANG Ling
    2024, 60(24):  149-157.  DOI: 10.3778/j.issn.1002-8331.2308-0161
    Abstract ( )   PDF (4224KB) ( )  
    References | Related Articles | Metrics
    Offline handwritten Chinese character recognition (HCCR) has been a great challenge in the field of computer vision. Compared with traditional methods, deep learning-based networks have achieved differentiated results in the recognition task by training a large amount of data, but the recognition effect is still in the process of development. Based on this, a multi-branch residual block is designed by combining DW convolution operations and residual connections. In this block, DW convolution operations increase the depth of the network and enhance feature extraction capabilities at the cost of smaller memory usage and parameter count. And the residual connections facilitate data spiraling flow, effectively mitigating gradient and degradation issues in the network. Furthermore, a multi-branch weight algorithm is proposed to address the weight allocation issue for the branches within the multi-branch residual block. Six multi-branch residual blocks are linearly connected to form the HCCR recognition network. The model achieves recognition accuracies of 97.77%, 97.30%, and 97.64% on the CASIA-HWDB1.0, CASIA-HWDB1.1, and ICDAR2013 datasets, respectively, showing high recognition accuracy.
    Knowledge Reasoning Method of Reinforcement Learning Integrating Action Withdrawal and Soft Reward
    SUN Chong, WANG Hairong, JING Boxiang, MA He
    2024, 60(24):  158-165.  DOI: 10.3778/j.issn.1002-8331.2308-0215
    Abstract ( )   PDF (3097KB) ( )  
    References | Related Articles | Metrics
    Aiming at the problems of overfitting and sparse reward in deep reinforcement learning reasoning methods, a knowledge reasoning method of reinforcement learning integrating action withdrawal and soft reward is proposed (AS-KRL). AS-KRL uses gated recurrent unit (GRU) to encode the historical path information and provide the global information of the current node for the agent’s action selection. By introducing the action exit strategy to hide some neurons randomly, the strategy network is constructed to improve the success rate of model path search and avoid the possible overfitting problem. The strategy network is used to guide the agent to make action selection, and the score function is called to calculate the similarity score of the triplet selected by the agent, and the score is taken as the reward of the agent, which effectively solves the sparse reward problem. To verify the effectiveness of the proposed method, experiments are carried out on FB15K-237 and NELL-995 datasets. The experimental results are compared with those of 9 mainstream methods such as TransE, MINERVA and HRL. The results show that the proposed method improves Hits@k by an average of 0.027 and MRR by an average of 0.056 on the link prediction task.
    Combining Large Model Fine-Tuning and Graph Neural Networks for Knowledge Graph Question Answering
    CHEN Junzhen, WANG Shuying, LUO Haoran
    2024, 60(24):  166-176.  DOI: 10.3778/j.issn.1002-8331.2406-0301
    Abstract ( )   PDF (4536KB) ( )  
    References | Related Articles | Metrics
    To address the challenges posed by inaccurate semantic parsing in traditional knowledge graph question answering systems when processing natural language queries, this paper proposes a method that integrates large model fine-tuning with graph neural networks. The approach begins with the collection of questions and the definition of their corresponding logical forms. Leveraging the robust semantic parsing capabilities of large pre-trained language models, the accuracy of question parsing is significantly enhanced through fine-tuning on question-answer pairs, where each pair includes a question and its associated logical form. Subsequently, the fuzzy set method is applied to further refine the fine-tuned logical forms, improving retrieval precision. Finally, graph neural networks are employed to perform relation projection and logical operations on these enhanced logical forms to derive the final answers. Experimental validation on standard general-domain datasets, such as WebQSP and ComplexWebQuestions, demonstrates that this method surpasses baseline models in terms of F1, Hit@1, and ACC metrics. Additionally, the method has been successfully applied and validated on domain-specific datasets, including those related to wind power equipment and high-speed trains, confirming its effectiveness in specialized domains.
    Graphics and Image Processing
    LOL-YOLO:Low-Light Object Detection Incorporating Multiple Attention Mechanisms
    JIANG Changjiang, HE Xuying, XIANG Jie
    2024, 60(24):  177-187.  DOI: 10.3778/j.issn.1002-8331.2406-0424
    Abstract ( )   PDF (7039KB) ( )  
    References | Related Articles | Metrics
    Addressing the challenges in low-illumination target detection, such as blurry night scenes, indistinct boundaries, and pronounced brightness disparities, this paper introduces LOL-YOLO (low-light YOLO), a detection method based on dynamic feature fusion. A self-correcting illumination module is incorporated to enhance low-light image quality and counteract target obscurity under low illumination. A dynamic feature extraction module is proposed, which leverages an attention mechanism combining large convolutional kernels with deformable convolutions, enabling extensive and agile contextual information capture. Finally, a dynamic detection head is devised to augment perception of varying scales, spatial positions, and tasks, thereby refining detection accuracy and robustness. Experimental validation using the ExDark, DarkFace, and NPD (nighttime pedestrian detection) datasets demonstrate significant accuracy improvements over prevalent algorithms, confirming the effectiveness of the proposed method.
    Array Target Image Segmentation with Boundary-Aware SegFormer Network
    LYU Yang, WU Jingjing, ZHUANG Zhishan, AN Congying
    2024, 60(24):  188-199.  DOI: 10.3778/j.issn.1002-8331.2308-0091
    Abstract ( )   PDF (8873KB) ( )  
    References | Related Articles | Metrics
    Aiming at the problem of low target segmentation accuracy due to the existence of non-uniform background, defect interference and weak edges in array target images in industrial scenarios, an array target image segmentation method with boundary-aware SegFormer network is proposed. Firstly, an adaptive seed searching strategy is proposed for the problem that fixed seeds are susceptible to background and defect interference. This strategy uses the correlation between seed location and target positioning accuracy to construct a heat map of seed distribution, and adaptively searches for ideal seed targets under the guidance of the heat map to achieve high-precision global segmentation of array targets. Secondly, the boundary-aware SegFormer network is designed for local segmentation, using recursive gated convolution to emphasize long-range and higher-order spatial interactions of features, an improved gated residual boundary refinement module to learn richer edge information, and the introduction of a hybrid loss function to enhance the supervision of the region interior and the edge pixels, which guides the network to better learn the target edge features and improve the boundary segmentation accuracy. Validation experiments on the self-built wafer dataset and the semantic segmentation dataset Cityscapes show that the proposed segmentation method is able to completely and accurately segment targets in high-resolution array target images with uneven backgrounds, defective contaminations, and low edge contrast with high real-time performance.
    Small Target Detection Algorithm for Aerial Images Based on YOLOv8n
    QI Xiangming, YAN Pingping, JIANG Liang
    2024, 60(24):  200-210.  DOI: 10.3778/j.issn.1002-8331.2405-0019
    Abstract ( )   PDF (5703KB) ( )  
    References | Related Articles | Metrics
    To address the issue of dense targets and mutual occlusion in small target detection for aerial images, this paper proposes a small target detection algorithm based on YOLOv8n for aerial images. The algorithm incorporates several key enhancements. Firstly, at the end of the backbone network, the Bottleneck is replaced in C2f with improved FasterNet, maintaining the number of channels while improving convergence speed. Secondly, the CBS activation function SiLU is replaced in SPPF with ReLU, setting the input negative value to zero, and then the SE attention mechanism is introduced to retain more small target features. Thirdly, the efficient multi-scale attention mechanism EMA is embeded in front of the detection head, obtaining more detailed information and enhancing small target attention. Finally, the baseline network loss function CIoU is replaced with Wise IoU, providing a gain allocation strategy that prioritizes common quality anchor frames and improving network generalization. Ablation and comparison experiments are conducted using the VisDrone2021 and RSOD datasets. Results show an increase in mAP@0.5 by 5.1 and 7.2 percentage points compared to baseline algorithms for each dataset. Additionally, mAP@0.5:0.95 improved by 4.4 and 2.1 percentage points, respectively. These findings demonstrate a notable enhancement in the accuracy of detection metrics. Generalization experiments on the publicly available dataset VOC2007+2012 show an improvement of 3.8 percentage points for mAP@0.5, demonstrating good robustness.
    YOLOv8-FD:YOLOv8 Improved Method for Detecting Surface Defects on Steel Plates
    MA Lei, LI Ye, WANG Yuxiang
    2024, 60(24):  211-221.  DOI: 10.3778/j.issn.1002-8331.2406-0223
    Abstract ( )   PDF (6876KB) ( )  
    References | Related Articles | Metrics
    Steel surface defect detection is an important challenge in the field of defect detection, there is still a serious situation of leakage and misdetection, and its detection accuracy is directly related to product quality, and may even jeopardize life safety. At the same time, the application of this technology to the actual production needs to consider resource saving and cost reduction. To solve these problems, a method based on the lightweight detection model YOLOv8-FD is introduced. Three major strategies are used:(1)A feature extraction module is added to C2f to better understand and utilize the input image information, and a DCN is introduced to enhance the feature extraction capability and improve the performance of the target detection; (2)A DUFPN is proposed to fuse the contextual features more efficiently, which drastically reduces the number of parameters and computation to achieve the lightweighting of the network; (3)W-CIOU is introduced as a bounding box loss function to better measure the similarity between targets, accelerate convergence, and improve target detection accuracy. The experimental results show that the model improves mAP by 5 percentage points, R by 3.3 percentage points, the amount of parameters by 27%, and the amount of computation by 35% compared with the baseline. In addition, the algorithm is confirmed to have good robustness through validation on the APSPC and VOC2007 datasets.
    Improved YOLOv8-Based Algorithm for Classroom Behavior Recognition of Students:DMS-YOLOv8
    CHEN Chen, BAO Wenxing, CHEN Xu, JING Yongjun, LI Weijun
    2024, 60(24):  222-234.  DOI: 10.3778/j.issn.1002-8331.2407-0132
    Abstract ( )   PDF (989KB) ( )  
    References | Related Articles | Metrics
    To address significant image size differences and small target detection challenges in smart classrooms, an improved YOLOv8 method for recognizing student behavior, DMS-YOLOv8, is proposed. Firstly, dynamic channel attention convolution (DCAConv) combines CA attention with deep convolution to dynamically adjust channel weights and capture key features. Secondly, multi-scale convolutional attention (MSCA) utilizes element-wise multiplication to enhance spatial details by maximizing multi-scale features. Additionally, a multi-scale context fusion (LCD) module is constructed to improve feature fusion using convolution and self-attention mechanisms. Finally, a small target detection layer is added to enhance the model’s ability to recognize back-row student behavior by extracting local features from larger-sized feature maps. Compared to the baseline YOLOv8n model, this method improves the mAP50 value by 4.6 percengtage points on a custom student behavior dataset and by 18.7 percengtage points on the VOC dataset, significantly increasing the accuracy of student classroom behavior recognition in smart classrooms.
    Low-Light Target Detection Algorithm Based on Multi-Level Feature Extraction
    TAN hao, ZHANG Jinglei, JIA Xin
    2024, 60(24):  235-242.  DOI: 10.3778/j.issn.1002-8331.2308-0093
    Abstract ( )   PDF (3952KB) ( )  
    References | Related Articles | Metrics
    The low-light scene is prevalent in natural environments which can reduce target detection accuracy. A low-light object detection algorithm MLFE-YOLOX is proposed. Firstly, the lightweight image enhancement algorithm IAT is introduced to restore more details of low-illumination images. Secondly, a CSP-M module for multi-level feature extraction is designed to strengthen the performance of the feature extraction model under low light conditions. Then, the convolutional attention mechanism CBAM is introduced to adaptively measure the correlation between target position and background information, which reduces the interference caused by background information. Finally, a multi-level feature fusion module CSP-MC is designed to enhance the model’s ability to fuse multi-level features and the ability to explore and fuse static and dynamic contextual information. ExDark and UFDD datasets are used for experimental verification, and the experimental results show that the proposed method effectively overcomes the influence caused by under illumination, and the detection accuracy is significantly improved in comparison with the mainstream algorithms.
    Improved Adaptive Learning Attention Network for Underwater Image Enhancement
    XU Yuan, LI Feng, YAN Jiaxiang
    2024, 60(24):  243-249.  DOI: 10.3778/j.issn.1002-8331.2308-0173
    Abstract ( )   PDF (4222KB) ( )  
    References | Related Articles | Metrics
    An underwater image enhancement algorithm based on supervised learning and adaptive learning attention network (adaptive learning attention network for underwater image enhancement, LANet) is proposed to solve the problems of high noise, serious color bias and blurred details in underwater images. Firstly, multi-scale fusion is used to strengthen the spatial information connection between channels. Then, the lighting features and color information are balanced by parallel attention mechanism. Then adaptive learning is used to retain shallow information and learn important feature information adaptively. Finally, multiple loss functions are constructed to improve the network performance. The experimental results show that compared with the existing algorithm, the peak signal-to-noise ratio (PSNR) index and the structural similarity index (SSIM) index of the proposed algorithm are increased by 8.99% and 15.39% respectively. The underwater color image quality evaluation (UCIQE) index has been improved by 1.92%, with better visual effects.
    Lightweight Semantic Segmentation of Tobacco Main Veins Fusing Coordinate Attention and Dense Connectivity
    SU Shuailin, GAN Bomin, LONG Jie, LIU Yuchen, GAI Xiaolei, ZHANG Jiwu
    2024, 60(24):  250-259.  DOI: 10.3778/j.issn.1002-8331.2308-0228
    Abstract ( )   PDF (5851KB) ( )  
    References | Related Articles | Metrics
    Aiming at the current problem of low automation in the process of analysing the main veins of tobacco leaves, which makes it difficult to cope with the extraction and recognition of complex main veins of tobacco leaves, a lightweight semantic segmentation of the main veins of tobacco leaves by integrating coordinate attention and mixed-connections atrous spatial pyramid pooling (MASPP) is proposed. The algorithm takes DeepLabV3+ network model as the framework, and adopts lightweight MobileNetV2 to replace the Xception network in the original framework, and carries out the main feature extraction in the way of “expanding-extracting-compressing”, so as to reduce the number of parameters of the network model, and introduces the coordinate attention mechanism to strengthen the learning ability of subtle features of main veins, and improves the learning ability of subtle features of main veins, and improves the learning ability of subtle features of main veins of the leaf. The introduction of the coordinate attention mechanism enhances the learning ability of the subtle features of the main veins of the tobacco leaves, and improves the regional misclassification of the main vein segmentation compared with the real distribution of the main veins. The MASPP structure of “mixed-connected dense sampling” is used to replace the empty space convolution pooling pyramid in the original network model, and improves the intermittent segmentation of the main veins of the tobacco leaves. The experimental results show that compared with the original DeepLabV3+ semantic segmentation algorithm, the training time is reduced from 635 min to 311 min, the average interaction ratio (mIOU) reaches 80.66%, the average pixel accuracy (mPA) reaches 91.96%, the number of parameters in the network model is compressed by 85.32%, and the storage space is reduced to 30.63 MB. The network model parameters are compressed by 85.32%, and the storage space is reduced to 30.63 MB.
    DB-YOLO:Dual Backbone YOLOv8 Model with Feature Enhancement Fusion for Road Defect Detection
    YE Famao, ZHANG Li, YUAN Liao, LI Dajun
    2024, 60(24):  260-269.  DOI: 10.3778/j.issn.1002-8331.2404-0019
    Abstract ( )   PDF (4906KB) ( )  
    References | Related Articles | Metrics
    Although many deep learning-based road defect detection methods have been proposed, these methods usually ignore some road defect-related edge feature information of edge-related features that are very important in road defect detection tasks. In order to make full use of this high-frequency information, this paper proposes an improved dual backbone YOLOv8 model (DB-YOLO) for road defect detection. Firstly, an edge feature extraction model (EFEM) is designed to filter the low-frequency information of the image and extract the high-frequency edge information of the image. Secondly, a dual backbone network is designed to extract features. An edge feature backbone (EFB) is added to the original model to process the high-frequency edge information of the image extracted by EFEM, extract edge features, and provide richer features for road defect detection. Finally, a new feature enhancement fusion module (FEFM) is proposed to fuse various features, and multiple FEFM modules are used to organically fuse edge features and image features of different levels. In addition, the introduction of label smoothing strategy weakens the impact of label quality in the dataset, enhances the generalization ability of the model, and further improves the detection accuracy of the model. Experimental results show that on the GRDDC2020 dataset, the mAP and F1 of DB-YOLO_v8s have achieved 56.42% and 56.13% respectively, which are improved by 1.3 and 1.96 percentage points respectively compared with YOLO_v8s. The detection speed reaches 64.94 frames per second, meeting the real-time detection requirements. In addition, the F1 scores of DB-YOLO_v8s on the official test sets Test_1 and Test_2 are 58.79% and 58.52% respectively. Compared with other methods, the F1 scores in the two test data sets are 0.65 and 1.37 percentage points higher respectively. Therefore, the proposed model can improve road defect detection accuracy.
    Network, Communication and Security
    MorViT Fingerprint Recognition Model for Tor Darknet Traffic
    ZHU Yi, CAI Manchun, YAO Lifeng, ZHANG Yiwen, CHEN Yonghao
    2024, 60(24):  270-281.  DOI: 10.3778/j.issn.1002-8331.2308-0104
    Abstract ( )   PDF (6123KB) ( )  
    References | Related Articles | Metrics
    The frequent occurrence of network attacks has led to the emergence of anonymous communication systems to protect user privacy. However, these systems have also been exploited by malicious actors to create the dark web for various illegal activities. Monitoring and identifying dark web traffic are crucial for maintaining network security. To address this issue, the MorViT model for Tor dark web traffic fingerprinting is proposed. The model transforms traffic data into images for visualization and model input. It incorporates one-dimensional inverted residual structures, two-dimensional inverted residual structures, and MobileViT modules to extract both local features of traffic and global features with long-range dependencies. To overcome the limitations of Transformers on small datasets, the model introduces learnable temperature coefficients and diagonal masking to enhance local inductive capabilities. Experimental results demonstrate that the MorViT model outperforms existing models in terms of classification accuracy, recall rate, and AUC in closed-world and open-world scenarios, effectively achieving Tor dark web traffic fingerprint recognition tasks.
    Byzantine Algorithm for Collaborative Optimization of Recommendation Reputation Model and Cluster Analysis
    LI Heji, WANG Chuanhua, XU Xin
    2024, 60(24):  282-290.  DOI: 10.3778/j.issn.1002-8331.2310-0150
    Abstract ( )   PDF (3506KB) ( )  
    References | Related Articles | Metrics
    A modified Byzantine fault-tolerant algorithm based on recommendation reputation model and clustering analysis is proposed to address the issues of random selection of main nodes, communication complexity, and high consensus delay in traditional PBFT (practical Byzantine fault tolerance) consensus algorithm. Firstly, based on the recommendation reputation model, the global trust value of nodes is calculated using the transaction behavior between nodes. On this basis, using global trust values to divide nodes into consensus nodes, non consensus nodes, and main group nodes, and making the node with the highest global trust value the main node can greatly reduce the probability of malicious nodes becoming the main node, thereby improving the efficiency of the system. Finally, after each round of consensus, the consensus nodes are clustered and divided based on their characteristics, further updating the global trust value of the nodes. Through simulation experiment analysis, it is found that the improved TK-PBFT algorithm reduces consensus latency by 25%, reduces communication overhead costs by more than 50%, and has higher throughput.
    Engineering and Applications
    Pavement Disease Detection Algorithm Focusing on Shape Features
    DENG Tianmin, CHEN Yuetian, YU Yang, XIE Pengfei, LI Qingying
    2024, 60(24):  291-305.  DOI: 10.3778/j.issn.1002-8331.2404-0259
    Abstract ( )   PDF (7064KB) ( )  
    References | Related Articles | Metrics
    Automatic pavement disease detection is a crucial technology for achieving intelligent road management. In addressing the challenges posed by small disease targets in pavement images, significant variations among different types of diseases, and complex background environments, an algorithm named FSF-YOLO (focusing on shape features YOLO) is proposed, which is based on the YOLOv8 architecture. This algorithm incorporates an enhanced feature extraction module designed to retain multi-dimensional spatial feature information, thereby enhancing the backbone network’s capability to extract features from low-resolution images and small disease targets. Additionally, it introduces a deformable attention feature fusion module that leverages the elongated shape features of diseases to expand the target recognition area and improve the model’s feature expression ability for long distance disease targets. Furthermore, the algorithm utilizes a grouped convolution space pyramid pool module to bolster the recognition of disease targets of varying sizes. Lastly, it employs lightweight shared convolutional detection heads to reduce both the number of network parameters and the computational load. Experimental results demonstrate that the proposed method offers superior performance in detecting various types of pavement diseases, with an average accuracy of 67.3% on the RDD2022 dataset, which is a 5.3 percentage points improvement over the original algorithm.
    Improved U-Net Pavement Crack Detection Method
    ZHANG Mingxing, XU Jian, LIU Xiuping, ZHANG Yongjin, ZHANG Chuang, NING Xiaoge
    2024, 60(24):  306-313.  DOI: 10.3778/j.issn.1002-8331.2307-0358
    Abstract ( )   PDF (5175KB) ( )  
    References | Related Articles | Metrics
    Aiming at the weak effect of basic U-Net on pavement crack segmentation, insufficient fineness of crack contour segmentation, difficulty in identifying narrow cracks, and low segmentation accuracy, this paper proposes an improved U-Net-based pavement crack segmentation method. Firstly, the improved ResNet50 is used as the backbone network to extract the pavement crack features, secondly, an attention mechanism-based feature fusion module is designed to improve the jump connection of U-Net, and finally, the improved model is obtained by adding the feature refinement head in the decoding part. The self-built pavement crack dataset is used to compare the proposed model with the current state-of-the-art models, and ablation experiments are done for the model before and after optimization. The experimental results show that the mIoU, Precision, and mPA of the proposed model on the self-built pavement crack dataset reach 0.838 1, 0.892 8, and 0.916 9, respectively, which are 0.019, 0.016 8, and 0.023 2 higher than the baseline U-Net, and the inference speed of 40.02 FPS can meet the needs of engineering applications. Finally, it is verified in the open-source Crack500 dataset that the model in this paper has stronger performance and generalization ability compared with network models such as U-Net and DeepLabV3+.
    Research on Influence of Vehicle Type on Traffic Flow Speed Under Target Detection
    XU Huizhi, CHANG Mengying, CHEN Yinan, HAO Dongsheng
    2024, 60(24):  314-321.  DOI: 10.3778/j.issn.1002-8331.2307-0389
    Abstract ( )   PDF (3624KB) ( )  
    References | Related Articles | Metrics
    Aiming at the shortcomings of traditional target detection and tracking algorithms such as low recognition accuracy and poor real-time performance, a real-time traffic flow and speed detection method for video vehicles based on YOLOv5s and DeepSort algorithm models is proposed. A dataset containing 25 877 target samples is constructed, and the YOLOv5s algorithm model is used to realize the detection of video vehicles, and the DeepSort algorithm is used to track and count and measure the speed of vehicles, realizing the real-time detection of vehicles by road section monitoring. Based on the speed and traffic volume data obtained by deep learning, a model of the influence of different models on the speed of traffic flow is constructed to explore the relationship between the speed of the models and the traffic volume in the smooth state, stable state, and congested state, and the results show that:the algorithm model is effective in detecting the video vehicles, and the average accuracy reaches 91.4%. The models have different influence on the speed of the traffic volume in different states. The speed of the taxis in the smooth and stable state, the speed of cab has a greater impact on the traffic volume. Under the congested state, the speed of private cars is more affected by the traffic volume.
    Dynamic Heterogeneous Network Representation Learning for Fraud Detection in Auto Insurance
    PAN Yijun, LIANG Bian, ZHANG Long, NA Chongning
    2024, 60(24):  322-330.  DOI: 10.3778/j.issn.1002-8331.2308-0078
    Abstract ( )   PDF (4414KB) ( )  
    References | Related Articles | Metrics
    Since the challenges inspired by the diverse and heterogeneous of the data and the large amount of historical data, a dynamic heterogeneous network representation learning method for fraud detection in the auto insurance is proposed. The graph is utilized to represent different structure and rich attribute nodes as vectors, and traditional machine learning algorithm is employed for fraud detection. Firstly, five random walk rules are designed based on the fraud types in the auto insurance, which provide multiple perspectives to describe fraud events. Secondly, a dynamic heterogeneous network node selection method is proposed to identify nodes relevant to newly collected auto insurance cases and the frequency of node in historical cases is calculated. The random walk paths and vector representations of these nodes are dynamically updated at a new timestamp. Finally, the effectiveness of the proposed algorithm is tested using the real auto insurance data, considering the fraud detection rate, fraud alarm rate, accuracy of fraud detection, running time, number of nodes and window size.
    Nonwoven Defect Detection by Fusing Selection Kernel Attention
    LU Yunting, KANG Shaopeng, WU Shuang, HE Chuan
    2024, 60(24):  331-339.  DOI: 10.3778/j.issn.1002-8331.2308-0182
    Abstract ( )   PDF (4687KB) ( )  
    References | Related Articles | Metrics
    Aiming at the problems of poor real-time performance and low detection accuracy of non-woven defect detection algorithm, a non-woven defect detection algorithm N-YOLO based on improved YOLOv5 is designed. Based on the actual situation of the production line and product characteristics, the algorithm uses visual detection technology. Firstly, based on the YOLOv5 algorithm, FasterNet network is introduced as the backbone feature extraction network for lightweight improvement, and partial convolution is used for feature extraction to reduce the model computation. At the same time, SK attention mechanism is added in C3 module to improve the model detection accuracy, and WIoUv1 loss function is used to calculate the boundary frame regression loss to improve the boundary frame positioning accuracy. Experimental results show that compared with YOLOv5, N-YOLO algorithm reduces floating point computation by 85.4%, parameter number by 52% from 7 020 913 to 3 368 105, model size is 6.63 MB, average detection accuracy can reach 99.2%, recall rate can reach 99.2%. Compared with target detection algorithms such as Faster R-CNN and SSD, it has obvious advantages, and can detect defects of non-wovens in real time under high-speed production without expensive hardware equipment.
    Forest Smoke Detection Method Without Open Flames Based on Improved YOLOv7
    WANG Haowen, PIAO Yan, WANG Yue, JIANG Pinyi
    2024, 60(24):  340-350.  DOI: 10.3778/j.issn.1002-8331.2308-0349
    Abstract ( )   PDF (6010KB) ( )  
    References | Related Articles | Metrics
    Rapid and accurate judgment of forest fire is of great significance to forest fire prevention. However, the existing forest smoke detection model extracts a single smoke feature. Therefore, the existing models do not perform well in the fire detection task when there is only smoke in the image with no visible fire. To address this problem, an improved YOLOv7-based smoke detection algorithm for forests without open fires is proposed. The algorithm introduces the attention mechanism CA and the full convolutional mask self-encoder framework FCMAE in the backbone network, so that the model can obtain richer local information while extracting semantic features and solves the feature collapse problem existing in the existing model. Meanwhile, a centralized feature pyramid CFP is introduced into the prediction network to strengthen the intra-layer adjustment ability of features. In addition, the model uses the loss function Wise-IoU with dynamic non-monotonic FM to strengthen the detection ability of low-quality smoke samples. The experimental results show that compared to other models, this model performs better in detecting smoke without open flames, with an accuracy of 98.1%, mAP@50 % reaching 99.1%.