Content of Graphics and Image Processing in our journal

        Published in last 1 year |  In last 2 years |  In last 3 years |  All
    Please wait a minute...
    For Selected: Toggle Thumbnails
    Lightweight Traffic Monitoring Object Detection Algorithm Based on Improved YOLOX
    HU Weichao, GUO Yuyang, ZHANG Qi, CHEN Yanyan
    Computer Engineering and Applications    2024, 60 (7): 167-174.   DOI: 10.3778/j.issn.1002-8331.2308-0081
    Abstract77)      PDF(pc) (630KB)(95)       Save
    Traffic target detection technology is an important tool for traffic management departments in key tasks such as traffic monitoring and safety surveillance. Faced with the large amount of traffic monitoring scene data, there is a need to employ traffic target detection techniques that offer fast detection speed, high accuracy and low computational resource utilization. To meet this need, this paper proposes a lightweight traffic target detection algorithm PL-YOLO for traffic monitoring scenes based on the YOLOX algorithm and the PP-LCNet network. Furthermore, considering the dense distribution and small size of vehicles in traffic monitoring scenes, the SimAM attention mechanism module is added to focus on more meaningful features. Experimental results demonstrate that PL-YOLO achieves 1.89 percentage points increase in detection accuracy, the model size decreases by 54% and the FPS increases from 20.88 frame/s to 26.68 frame/s compared to the YOLOX-s model.
    Reference | Related Articles | Metrics
    Improved YOLOv7 Algorithm for Wood Surface Defect Detection
    JIANG Xingwang, ZHAO Xingqiang
    Computer Engineering and Applications    2024, 60 (7): 175-182.   DOI: 10.3778/j.issn.1002-8331.2309-0185
    Abstract64)      PDF(pc) (599KB)(81)       Save
    High quality wood is deeply loved by people, but it has various defects that lead to low yield and low utilization rate of high-quality wood. The use of deep learning object detection algorithms can achieve rapid and stable detection of wood surface defects, thereby improving the quality and utilization of wood. A wood surface defect detection model YOLOv7-ESS based on improved YOLOv7 is proposed to address the problem of poor detection accuracy caused by the small, dense, and complex target size of wood surface defects. Firstly, in response to the issue of extreme aspect ratio affecting the detection effect of wood crack defects, an attention module ECBAM is embedded to enhance the model’s feature extraction ability by enhancing attention to extreme aspect ratio defects. Secondly, in response to the problem of severe loss of feature information for small defects on the wood surface during feature extraction, a shallow weighted feature fusion network SFPN is introduced, which uses deep feature maps as output and effectively utilizes shallow feature information to improve the recognition accuracy of small defects. Finally, the SIoU loss function is introduced to improve the convergence speed and accuracy of the model. The results show that the average detection accuracy of the YOLOv7-ESS model is 94.7%, which is 11.2 percentage points higher than YOLOv7 and meets the defect detection requirements for wood production and processing.
    Reference | Related Articles | Metrics
    DY-YOLOv5:Target Detection for Aerial Image Based on Multiple Attention
    ZHAO Xin, CHEN Lili, YANG Weichuan, ZHANG Chengwang
    Computer Engineering and Applications    2024, 60 (7): 183-191.   DOI: 10.3778/j.issn.1002-8331.2309-0419
    Abstract82)      PDF(pc) (1074KB)(85)       Save
    Aiming at the problem of low detection accuracy caused by small targets, different scales and complex backgrounds in UAV aerial images, a target detection algorithm for UAV aerial images based on improved YOLOv5 is proposed. The algorithm introduces a target detection head method Dynamic Head with multiple attention mechanisms to replace the original detection head and improves the detection performance of the detection head in complex backgrounds. An upsampling and Concat operation is added to the neck part of the original model, and a multi-scale feature detection including minimal, small and medium targets is performed to improve the feature extraction ability of the model for medium and small targets. DenseNet is introduced and integrated with the C3 module of YOLOv5s backbone network to propose the C3_DenseNet module to enhance feature transfer and prevent model overfitting. The DY-YOLOv5 algorithm is applied to the VisDrone 2019 dataset, and the mean average precision (mAP) reaches 43.9%, which is 11.4 percentage points higher than the original algorithm. The recall rate (Recall) is 41.7%, which is 9.0 percentage points higher than the original algorithm. Experimental results show that the improved algorithm significantly improves the accuracy of target detection in UAV aerial images.
    Reference | Related Articles | Metrics
    Hyperspectral Image Classification Based on Double Branch Multidimensional Attention Feature Fusion
    MA Yamei, WANG Shuangting, DU Weibing
    Computer Engineering and Applications    2024, 60 (7): 192-203.   DOI: 10.3778/j.issn.1002-8331.2211-0139
    Abstract32)      PDF(pc) (729KB)(29)       Save
    To improve the classification performance of small sample classes of hyperspectral images and to enhance the robustness of the model feature representation, a neural network classification model with two-branch multidimensional attentional feature fusion (DBMD) is proposed. DBMD uses two branches for spectral feature extraction and hybrid feature extraction respectively. The spectral branch extracts features step-by-step through densely connected dilated convolution, and then fuses low, medium and high level semantic information as the feature output. The hybrid branch uses a 3D-2D network architecture and extracts spatial scale features through improved Inception blocks. In addition, the attention mechanism is applied to spectral, spatial and spatial-spectral feature extraction respectively for feature refinement and to enhance the feature response in important regions. Finally, the refined features of different dimensions are jointly input to the classifier for classification. Experiments using 5% and 1% samples on the Indian Pines and Salinas Valley datasets achieve an overall accuracy of 98.40% and 99.78% respectively, and the proposed model performs better in terms of accuracy and stability compared to the other six network architectures.
    Reference | Related Articles | Metrics
    VR Interactive 3D Virtual Crane Modeling and Simulation
    HUANG Kaige, HUI Yanbo, LIU Yonggang, WANG Hongxiao, WANG Qiao
    Computer Engineering and Applications    2024, 60 (7): 204-211.   DOI: 10.3778/j.issn.1002-8331.2211-0173
    Abstract20)      PDF(pc) (884KB)(17)       Save
    Crane, as a widely used special equipment, is highly dangerous in operation and prone to safety accidents. In order to reduce the safety accidents caused by improper operation, the state attaches great importance to the crane safety training. The current training operations are mostly traditional demonstration training with high training costs and poor results. Virtual reality technology has the advantages of immersion, interaction and multi-perception and so on. Based on this, this study uses virtual reality technology to establish crane training and assessment system, which have greatly improved the effect of worker training. In order to restore the true use scene of the crane, first of all, reverse engineering technology is used to model the crane and crane workshop. Secondly, in view of the reality of 3D virtual scene and the poor reusability of interactive models with a large number of models, the level of detail (LOD) model is used to build the geometric model of crane, so as to optimize the realism and real-time of the system. Then, 3D scene roaming, collision detection and fast navigation of crane are realized with Unity platform. Based on MySQL database, the training data and crane important parameters data are storage in real-time. Finally, the virtual platform is validated according to the existing bridge platform. The results show that the crane virtual reality training system can greatly improve the sensory training while reducing the training cost, and the experiment has a better effect.
    Reference | Related Articles | Metrics
    Generative Adversarial Network with Dual Discriminator and Mixed Attention
    WANG Lei, YANG Jun, ZHANG Chiyu, DAI Zaiyan
    Computer Engineering and Applications    2024, 60 (7): 212-221.   DOI: 10.3778/j.issn.1002-8331.2211-0196
    Abstract31)      PDF(pc) (853KB)(44)       Save
    In image generation tasks, how to improve the quality of generated images is a key problem. Currently, the multi-layer convolutional structure adopted by GAN has the problem of local induction bias, which cannot focus on key information, resulting in losing image features during training process. In this paper, a model of generative adversarial network with dual discriminator and mixed attention, termed as DDMA-GAN, is proposed. Firstly, DDMA-GAN designs a mixed attention mechanism, which utilizes channel attention and spatial attention to fully capture image feature information. Secondly, to solve the problem of discrimination error of single discriminator, a dual discriminator structure is proposed. The fusion coefficient is used to fuse the judgment results to make the returned parameters more objective, and the data augmentation module is embedded to further improve the robustness of the model. Finally, the hinge loss is used as loss function to maximize the distance between true and fake samples. The model is verified on public datasets LSUN and CelebA. Experimental results show that images generated by DDMA-GAN on classical datasets are more realistic. FID and MMD of DDMA-GAN are significantly reduced, which fully indicate validity of model.
    Reference | Related Articles | Metrics
    Image Feature Classification Based on Multi-Agent Deep Reinforcement
    ZHANG Zewei, ZHANG Jianxun, ZOU Hang, LI Lin, NAN Hai
    Computer Engineering and Applications    2024, 60 (7): 222-228.   DOI: 10.3778/j.issn.1002-8331.2211-0129
    Abstract24)      PDF(pc) (659KB)(26)       Save
    In order to solve the problem of high complexity of input image data in machine learning tasks such as image feature recognition and classification, a multi-agent deep reinforcement learning method for image feature classification is proposed. Firstly, the image feature classification task is transformed into a partially observable Markov decision process. It uses multiple moving isomorphic agents to collect part of the image information, and studies how agents form local understanding of the image and take actions, and how to extract and classify relevant features from locally observed images, so as to reduce the data complexity and filter out irrelevant data. Secondly, the improved value function decomposition method is used to train the agent strategy network, and the global return of the environment is divided according to the contribution of each agent, so as to solve the reliability allocation problem of the agent. The proposed method is verified on MNIST handwritten numerals data set and NWPU-RESISC45 remote sensing image data set. Compared with the baseline algorithm, it can learn more effective association strategies, and the classification process has better stability and improved accuracy.
    Reference | Related Articles | Metrics
    Camouflage Object Detection Algorithm Based on Edge Attention and Reverse Orientation
    HE Wenhao, GE Haibo, CHENG Mengyang, AN Yu, MA Sai
    Computer Engineering and Applications    2024, 60 (7): 229-237.   DOI: 10.3778/j.issn.1002-8331.2211-0211
    Abstract25)      PDF(pc) (842KB)(29)       Save
    Camouflage object detection (COD) has important application value in many fields. The existing COD algorithm mainly focuses on the expression of the features extracted from the backbone network and the problem of feature fusion, ignoring the problems of focusing on the edge features of the object and inferring the real area of the object. Aiming at the above problems, a camouflaged object detection algorithm based on edge attention and reverse positioning is proposed. The algorithm consists of edge attention module (EAM), close integration module (CIM) and reverse positioning module (RPM). First, the EAM module is used in the feature encoding stage to enhance the expression of multi-level features extracted from the Res2Net-50 backbone network and highlight edge features. Then, the CIM module is used for the fusion of multi-level features to reduce the loss of feature information. Finally, the RPM module is used to process the rough prediction maps from different feature pyramids, reverse localize the real region of the object, and infer the real object. Experiments on 3 public datasets show that the proposed algorithm outperforms the other 8 state-of-the-art models. On the COD10K dataset, the mean absolute error (MAE) reaches 0.038.
    Reference | Related Articles | Metrics
    Learning Gaussian-Aware Constraint Spatial Anomaly for Correlated Filter Target Tracking
    JIANG Wentao, WANG Zimin, ZHANG Shengchong
    Computer Engineering and Applications    2024, 60 (7): 238-247.   DOI: 10.3778/j.issn.1002-8331.2211-0408
    Abstract17)      PDF(pc) (903KB)(14)       Save
    In an effort to solve the loss of target tracking in complicated movements, a target tracking algorithm with Gaussian-aware constraint space anomaly is proposed. Firstly, the feature sampling points of the target are established with Gaussian uniform distribution as the distribution law, and the appearance model and weight model of the target are extracted with convolution structure. Secondly, in an effort to constrain spatial anomaly, spatial regular terms are constructed in the target function; while at the same time the target weight model is updated to minimize the occurrence of spatial overfitting, thereby enhancing the spatial anomaly adaptability of the tracker. Lastly, the weighted least square method is applied to obtain the weight response model center, so as to determine the target center, update the tracking position, thereby enhancing the robustness of the tracker. By means of OTB2015 and UAV20L dataset, the algorithm proposed in this paper, when compared with other mainstream relevant filtering algorithms, presents high tracking success rate and tracking accuracy under such complicated circumstances as low resolution and obstruction due to target motion.
    Reference | Related Articles | Metrics
    Hand Pose Estimation Based on Multi-Feature Enhancement
    FENG Xinxin, GAO Shu
    Computer Engineering and Applications    2024, 60 (6): 207-213.   DOI: 10.3778/j.issn.1002-8331.2210-0089
    Abstract39)      PDF(pc) (580KB)(35)       Save
    Hand pose estimation is one of the important research directions of computer vision, which plays an important role in human-computer interaction, virtual reality, robot control and other application fields. At present, hand pose estimation has the problem of single feature representation method. This paper proposes a feature construction method of hand key point connection relationship and a key point feature aggregation enhancement method based on hand motion semantic relationship to improve the hand feature representation and information sharing ability. Aiming at the occlusion problem in hand target detection and image segmentation, a hand contour feature extraction method is designed to improve the preprocessing effect. Based on the proposed multi-feature representation and enhancement method, a depth learning neural network model based on full convolution structure is constructed to avoid the problem of spatial information loss caused by direct regression calculation of 3D pose information, thus effectively improving the accuracy of 3D hand pose estimation. Compared with the SOTA model on DO, ED, RHD datasets, it has achieved a competitive effect, and the average AUC result has reached 93.3%, indicating that the proposed method also has good universality.
    Reference | Related Articles | Metrics
    Commonsense Oriented Fine-Grained Data Augmentation
    LI Huachao, KANG Bin, WANG Lei
    Computer Engineering and Applications    2024, 60 (6): 214-221.   DOI: 10.3778/j.issn.1002-8331.2210-0361
    Abstract28)      PDF(pc) (618KB)(24)       Save
    The representative researches on data augmentation are mainly carried out on common classification benchmark datasets such as ImageNet. Considering intra-class and inter-class relation in fine-grained visual classification(FGVC) datasets is so different from ordinary classification datasets, data augmentation methods for FGVC need to be further studied. Therefore, this paper proposes a fine-grained semantic image patch mixing method by commonsense(ComSipmix), starting from the fine-grained recognition task and the special properties of the dataset. The proposed method exploits common sense knowledge to mine potential associations between sample labels, and designs a multi-branch convolutional neural network structure for structured image mixing strategy based on this, so that the image mixing process pays more attention to the subtle differences of targets.  Through extensive performance tests, it can be verified that the performance of the proposed method is significantly better than the mainstream image mixing-based data augmentation methods. At the same time, through experimental verification, the common sense knowledge proposed in this paper helps to improve the performance of various data augmentation models based on mixed image classes.
    Reference | Related Articles | Metrics
    CME-Based Few-Shot Detection Model with Enhanced Multiscale Deep Features
    DING Zhengwei, BAI Hexiang, HU Shen
    Computer Engineering and Applications    2024, 60 (6): 222-229.   DOI: 10.3778/j.issn.1002-8331.2211-0419
    Abstract33)      PDF(pc) (614KB)(34)       Save
    A CME-based few-shot detection model with enhanced multiscale deep feature is proposed to address the problems that existing few-shot detection models have insufficient consideration of global semantic information of images and degradation of detector performance due to varying input image sizes. Firstly, the model is trained with a large amount of labeled base class data and a multilayer convolutional neural network based on residual jumping and a multiscale feature enhanced module with good generalization, then the model is fine-tuned with a small amount of labeled new class data and base class data, and finally the fine-tuned model is used for target detection. To verify the effectiveness of the model, the VOC2007 and VOC2012 datasets are used to train and evaluate the model, and the relevant ablation experiments demonstrate that the introduction of a multilayer convolutional neural network with residual jump structure and the multi-scale feature enhanced module can further increase the accuracy of the model both alone and in combination. In comparison experiments with six representative small-sample target detection models, it is shown that the CME with multiscale deep feature deepening scores outperforms the state-of-the-art detector by an average of 4.75 percentage points.
    Reference | Related Articles | Metrics
    Small Target-Oriented Multi-Space Hierarchical Helmet Detection
    LI Jiaxin, HU Yang, HUANG Xiezhou, LI Hongjun
    Computer Engineering and Applications    2024, 60 (6): 230-237.   DOI: 10.3778/j.issn.1002-8331.2210-0353
    Abstract29)      PDF(pc) (792KB)(28)       Save
    As there are factors affecting the detection effect such as small targets and distances in the target video, it is difficult to capture small targets. A multi-spatial hierarchical helmet wearing detection algorithm for small targets is proposed in the article, which will be personalized and improved on the basis of Yolov5s network model. Firstly, a multi-spatial attention module is designed to consider the effects of spatial features from different perspectives and fuse them to enhance the spatial location relationships of salient features. Secondly, features at multiple spatial scales are fused while combining multiple features in the feature extraction process to adapt to the capture of targets at different spatial levels and improve the detection of small targets. Thirdly, data augmentation is used to improve the generalizability of the dataset to adapt the training targets to more diverse scenarios. Finally, the loss function is optimized to enhance the regression capability and improve the training effect. The experimental results show that the proposed algorithm achieves an average accuracy of 91.5%, significantly reducing the number of missed detections. In addition, the proposed algorithm has been deployed to real construction sites and has shown superior performance in detecting small targets, which is of great value for application.
    Reference | Related Articles | Metrics
    Expression Recognition Combining 3D Interactive Attention and Semantic Aggregation
    WANG Guangyu, LUO Xiaoshu, XU Zhaoxing, FENG Fangyu, XU Jiangjie
    Computer Engineering and Applications    2024, 60 (6): 238-248.   DOI: 10.3778/j.issn.1002-8331.2210-0398
    Abstract30)      PDF(pc) (701KB)(37)       Save
    A facial expression recognition method combining 3D augmented attention and semantic aggregation is proposed to address the problems that traditional convolutional networks are difficult to effectively integrate features of facial expressions of faces at different stages, have feature expression bottlenecks and cannot efficiently utilize contextual semantics. Firstly, it is optimized on the basis of rank expansion (ReXNet) network to fuse contextual features while eliminating expression bottlenecks to make it more suitable for expression recognition tasks. Secondly, to capture discriminative face expression fine-grained features, 3D augmented attention is constructed by combining non-local blocks with cross-dimensional information interaction theory. Finally, in order to fully utilize the shallow and mid-level underlying features and high-level semantic features of expressions, a semantic aggregation module is designed to aggregate multi-level global contextual features with high-level semantic information to achieve mutual semantic gain of expressions of the same class and enhance intra-class consistency. Experiments show that the accuracy of the method is 88.89%, 89.53% and 62.22% on the publicly available datasets RAF-DB, FERPlus and AffectNet-8, respectively, demonstrating the advancedness of the method.
    Reference | Related Articles | Metrics
    Semi-Supervised Object Detection Algorithm Based on Localization Confidence Weighting
    FENG Zeheng, WANG Feng
    Computer Engineering and Applications    2024, 60 (6): 249-258.   DOI: 10.3778/j.issn.1002-8331.2210-0400
    Abstract31)      PDF(pc) (696KB)(21)       Save
    Reference | Related Articles | Metrics
    Wavelet Frequency Division Self-Attention Transformer Image Deraining Network
    FANG Siyan, LIU Bin
    Computer Engineering and Applications    2024, 60 (6): 259-273.   DOI: 10.3778/j.issn.1002-8331.2211-0099
    Abstract69)      PDF(pc) (1362KB)(64)       Save
    In view of the weak ability of vision Transformer (ViT)  to capture high-frequency information and the problem that many image deraining methods are prone to lose details, a wavelet frequency division self-attention Transformer image deraining network (WFDST-Net)  is proposed. As the main module of WFDST-Net, the wavelet frequency division self-attention Transformer (WFDST)  uses non-separable lifting wavelet transform to obtain the low-frequency and high-frequency components of feature map, and carries out self-attention interaction in the low frequency and high frequency respectively, so that the module can learn from the low frequency to restore the overall structure, and strengthen the ability to capture line details such as rain streaks in the high frequency, thus enhancing the modeling ability of different frequency domain features. WFDST-Net adopts U-shaped architecture and obtains multi-scale features through non-separable lifting wavelet transform, which can capture high-frequency rain streaks of different shapes while ensuring the integrity of information. WFDST-Net has lower parameters than other Transformers related to image deraining. In addition, the VOCRain250 dataset is proposed for the task of joint image deraining and semantic segmentation, which has advantages over the currently widely used BDD150. The experimental results show that the proposed method enhances the ability of ViT to capture different frequency domain information, and outperforms the current state-of-the-art deraining methods in the performance of synthetic and real-world datasets and joint semantic segmentation tasks. It can effectively remove complex rain streaks while retaining more background details.
    Reference | Related Articles | Metrics
    Lightweight Object Detection Method for Constrained Environments
    QU Haicheng, YUAN Xudong, LI Jiaqi
    Computer Engineering and Applications    2024, 60 (6): 274-281.   DOI: 10.3778/j.issn.1002-8331.2211-0283
    Abstract24)      PDF(pc) (604KB)(16)       Save
    The lightweight design of object detection models plays an important role in environments with limited computing resources and storage space. To further compress the size of the object detection model and improve its detection accuracy, a higher performance lightweight object detection model named Lite-YOLOX is proposed, which improves the structure of the feature pyramid, the structure of the decoupling head, and the loss function based on the YOLOX-Tiny model. Firstly, to further compress the size of the original model, the structure of the feature pyramid and decoupled head are redesigned to make the neck and head parts of the model lighter. Then, to improve the detection accuracy of the model, the EIoU loss function which is more sensitive to the position of the ground truth box is designed to optimize the proposed model. Finally, the validation experiments are performed on the Pascal VOC and safety helmet wearing dataset. The experimental results show that compared with YOLOX-Tiny, Lite-YOLOX reduces the parameters by 40%, the floating point of operations by 37.5%, and the mAP50 increases by 3.2 and 3.1 percentage points. On the NVIDIA Jetson Xavier NX, the frames per second (FPS)  is increased from 51 to 59, and the real-time performance is significantly improved.
    Reference | Related Articles | Metrics
    Multi-Object Tracking with Spatial-Temporal Embedding Perception and Multi-Task Synergistic Optimization
    LIANG Xiaoguo, LI Hui, CHENG Yuanzhi, CHEN Shuangmin, LIU Hengyuan
    Computer Engineering and Applications    2024, 60 (6): 282-292.   DOI: 10.3778/j.issn.1002-8331.2211-0385
    Abstract30)      PDF(pc) (1097KB)(39)       Save
    To solve the tracking challenges caused by frequent occlusion, crowded scenes and variable object scales in multi-object tracking, a multi-object tracking method is proposed via spatial-temporal embedding perception and multi-task synergistic optimization. Firstly, spatial correlation module is proposed to extract discriminative embedding with object context awareness in spatial. Secondly, temporal correlation module is proposed to aggregate the embedding extracted from spatial correlation module, and aggregated embedding is used to generate temporal attention to guide spatial correlation module to extract more discriminative embedding in frequent occlusion and crowded scenes. Therefore, discriminative embedding enhances association robustness while predicting more accurate detection box to overcome the scale variability issues, and accurate detection box facilitates the extraction of higher quality embedding for the proposed modules. In this way, the synergistic optimization among multiple tasks of embedding extraction, position prediction and data association is achieved. Finally, GIoU distance among detection boxes is introduced into the affinity matrix to further improve association robustness in occlusion and crowded scenes. Experimental results on MOT16, MOT17 and MOT20 datasets show that the proposed method exhibits superior tracking performance to state-of-the-art methods.
    Reference | Related Articles | Metrics
    Research on Pedestrian Multi-Object Tracking Algorithm Under OMC Framework
    HE Yuting, CHE Jin, WU Jinman, MA Pengsen
    Computer Engineering and Applications    2024, 60 (5): 172-182.   DOI: 10.3778/j.issn.1002-8331.2211-0344
    Abstract44)      PDF(pc) (859KB)(32)       Save
    Multi-object tracking is an important direction that has been widely studied in the field of computer vision, but in practical applications, the rapid movement of targets, lighting changes, and occlusions can lead to poor tracking performance, therefore, the multi-object tracking model OMC is used as the basic framework to carry out research to achieve further improvement of tracking performance. Firstly, to address the problem of uneven quality of target features in multi-object tracking, the feature extractor is optimized by integrating the GAM attention mechanism in the backbone network and replacing the upsampling method in the Neck network part. Secondly, to address the “competition problem” between detection and re-identification tasks in existing methods, a recursive cross-correlation network is constructed so that the model can learn the characteristics and commonalities of different tasks. Here, two sub-tasks are optimized separately, on the one hand, a new channel attention HS-CAM is designed to optimize the re-identification network;on the other hand, the boundary regression loss function of the detection part is replaced and the EIoU loss function is adopted. Experiments show that MOTA metrics can reach 73.5%, IDF1 can reach 70.4%, and MLgt is 11.7% on MOT16 dataset, which is 1.5 percentage points reduction compared to OMC algorithm.
    Reference | Related Articles | Metrics
    UAV Small Object Detection Algorithm Based on Context Information and Feature Refinement
    PENG Yanfei, ZHAO Tao, CHEN Yankang, YUAN Xiaolong
    Computer Engineering and Applications    2024, 60 (5): 183-190.   DOI: 10.3778/j.issn.1002-8331.2305-0401
    Abstract77)      PDF(pc) (661KB)(91)       Save
    Object detection in UAV aerial images is a research hotspot in recent years, aiming at the problem of low detection accuracy caused by small and dense objects and complex background from the perspective of UAV, a UAV small object detection algorithm based on context information and feature refinement is proposed. Firstly, through the context feature enhancement module, the multi-scale dilated convolution is used to capture the potential relationship with the pixels in the surrounding area, which complements the context information of the network. According to the feature layers of different scales, the output weights of each level of feature maps are adaptively generated, and the expression ability of the feature map is dynamically optimized. Secondly, due to different fineness of different feature maps, the feature refinement module is used to suppress the interference of conflict information in feature fusion to prevent the small object features from drowning in conflict information. Finally, a weighted loss function is designed to accelerate the convergence speed of the model and further improve the accuracy of small object detection. Extensive experiments on the VisDrone2021 dataset show that the improved model improves 8.4 percentage points over the benchmark model mAP50, 5.9 percentage points over mAP50:95, and the FPS is 42, which effectively improves the detection accuracy of small objects in UAV aerial images.
    Reference | Related Articles | Metrics
    Re-Parameterized YOLOv8 Pavement Disease Detection Algorithm
    WANG Haiqun, WANG Bingnan, GE Chao
    Computer Engineering and Applications    2024, 60 (5): 191-199.   DOI: 10.3778/j.issn.1002-8331.2309-0354
    Abstract73)      PDF(pc) (592KB)(99)       Save
    Road disease detection is an important way to ensure people’s traffic safety. In order to improve the accuracy of road disease detection and achieve timely and accurate road disease detection, a pavement disease detection model of re-parameterized YOLOv8 is proposed. First of all, CNX2f module is introduced into the backbone network to improve the ability of the network to extract pavement disease features, and effectively solve the problem that the pavement disease features are easily confused with the background environmental features. Secondly, RepConv and DBB reparameterization modules are introduced to enhance the capability of multi-scale feature fusion and solve the problem of large scale difference of pavement diseases. At the same time, the shared parameter structure of the head is improved, and RBB reparameterization module is introduced to solve the problem of head parameter redundancy and improve the feature extraction capability. Finally, the SPPF_Avg module is introduced to solve the problem of pavement feature loss and enrich the multi-scale feature expression. The experimental results show that the accuracy of the improved road disease detection network is 73.3%, the recall rate is 62.3% and the mAP is 69.3%, which is 2.6, 3.0 and 2.8 percentage points higher than that of the YOLOv8 network, and the detection effect of the model is improved.
    Reference | Related Articles | Metrics
    Traffic Sign Detection Algorithm Based on Improved YOLOv5-S
    LIU Haibin, ZHANG Youbing, ZHOU Kui, ZHANG Yufeng, LYU Sheng
    Computer Engineering and Applications    2024, 60 (5): 200-209.   DOI: 10.3778/j.issn.1002-8331.2306-0293
    Abstract89)      PDF(pc) (689KB)(110)       Save
    In the field of autonomous driving, existing traffic sign detection methods have problems with missed or incorrect sign detection in complex backgrounds, reducing the reliability of intelligent vehicles. To address this issue, a real-time traffic sign detection algorithm is proposed to enhance YOLOv5-S. Firstly, the coordinate attention mechanism is integrated into the feature extraction network to perceive the location of the object by establishing long-term dependencies on the target, making the algorithm focus on high-priority regions. Secondly, the Focal-EIoU loss function is used to replace the CIoU, allowing the network to focus more on high-quality classification samples, improving the network’s ability to learn from difficult samples and reducing the occurrence of missed or false detections. Next, the lightweight convolution technique GSConv is integrated into the network to reduce the complexity of the model. Finally, a new small target detection layer is added to improve the algorithm’s detection of small-sized signs by using richer feature information. The experimental results show that the improves algorithm achieves 88.1% for mAP@0.5 and 68.5% for mAP@0.5:0.95, with a detection speed of 83 FPS, which can meet the requirements of real-time and reliable detection.
    Reference | Related Articles | Metrics
    Multi-Coupled Feedback Networks for Image Fusion and Super-Resolution Methods
    WANG Rong, DUANMU Chunjiang
    Computer Engineering and Applications    2024, 60 (5): 210-220.   DOI: 10.3778/j.issn.1002-8331.2212-0118
    Abstract34)      PDF(pc) (697KB)(24)       Save
    People often need to obtain high dynamic range and high resolution images in their daily life. However, due to the limitation of technical equipment, high dynamic range images are often obtained by multi-exposure fusion (MEF) of low dynamic range images, and high resolution images are often obtained by super resolution (SR) of low resolution images. MEF and SR are usually studied as two separate elements. In order to solve the problem that the current model cannot achieve high dynamic range and high resolution at the same time, a multi-coupling feedback network (MCF-Net) and its method are proposed in this paper through the study of existing methods. The model includes:[N] subnets and output modules; in the method, first, [N] downsampled images [Iilr,Imlr,I-ilr] are input to [N] subnets respectively, and the extracted low-resolution features [Filr,Fmlr,F-ilr]; then the super-resolution features [Gi0,Gm0,G-i0] of the corresponding images are extracted according to the low-resolution features; the fused high-resolution features [Git,Gmt,G-it] are obtained and input to the next MCFB until the T-th MCFB obtains the fused high-resolution features [GiT,GmT,G-iT]; then the corresponding fused super-resolution image [Iit,Imt,I-it] is obtained; finally the high dynamic range, super-resolution image [Iout] is obtained by fusing the output [IiT,ImT,I-iT] of the [T]-th reconstruction module REC in [N] subnets. In this paper, the performance is experimented and verified on the SICE dataset, and compared with 33 existing methods, the results show that each of the following evaluation indexes has been significantly improved, including the structural similarity (SSIM) reaching 0.833 2, the peak signal-to-noise ratio (PSNR) reaching 22.07 dB, and the multi-exposure fusion similarity (MEF-SSIM) reaching 0.937 8.
    Reference | Related Articles | Metrics
    Image Super-Resolution Reconstruction Algorithm with Adaptive Aggregation of Hierarchical Information
    CHEN Weijie, HUANG Guoheng, MO Fei, LIN Junyu
    Computer Engineering and Applications    2024, 60 (5): 221-231.   DOI: 10.3778/j.issn.1002-8331.2210-0155
    Abstract34)      PDF(pc) (688KB)(44)       Save
    With the development of convolutional neural networks, image super-resolution reconstruction algorithms have made some breakthroughs. Nevertheless, the existing image super-resolution algorithms rarely distinguish the use of hierarchical features and suffer from the problem of costly multi-scale feature extraction. To address these problems, this paper proposes an image super-resolution reconstruction algorithm with adaptive aggregation of hierarchical information. Specifically, the algorithm applies a multi-level information refinement mechanism for the adaptive enhancement of features at different levels to solve the problem that the hierarchical features are not distinguishably utilized. In addition, it is proposed to construct a fine-grained multi-scale information aggregation block to solve the problem of costly multi-scale information extraction and poor feature representation capability. Finally, the algorithm focuses on contrast-enhanced recombinant attention blocks to achieve adaptive calibration of features at a lower cost by exploiting channel and spatial information. Extensive experiments show that compared with other advanced algorithms, the proposed method can achieve better metrics and visual results on five benchmark datasets such as Urban100.
    Reference | Related Articles | Metrics
    Improved UNet++ for Tree Rings Segmentation of Chinese Fir CT Images
    LIU Shuai, GE Zhedong, LIU Xiaotong, GAO Yisheng, LI Yang, LI Mengfei
    Computer Engineering and Applications    2024, 60 (5): 232-239.   DOI: 10.3778/j.issn.1002-8331.2210-0212
    Abstract51)      PDF(pc) (894KB)(63)       Save
    In order to solve the problem that it is difficult to accurately segment tree rings with defects such as cracks, wormholes and knots. The medical CT is used as experimental equipment to reconstruct 125 CT images of Chinese fir transverse sections, and these images are used as the data set. Data set is expanded by pre-processing such as cutting, rotating and flipping CT images. An improved UNet++ model is proposed for tree rings segmentation. Convolutional blocks, downsampling layers, skip connections and upsampling layers have been added to the improved UNet++ model, and the learning depth is increased to 6 layers. The BCEWithLogitsLoss, ReLU and RMSProp are used as loss function, activation function and optimization function respectively. The improved UNet++ model is used to segment the tree rings of the transverse sections of Chinese fir reconstructed by CT, and the performance of the model is evaluated. The results show that the pixel accuracy of the improved UNet++ model is 97.81%, the dice coefficient is 98.89%, the intersection over union is 95.29%, and the mean intersection over union is 84.75%. The best segmentation result is obtained by fully extracting the characteristics in Chinese fir tree rings. Compared with the U-Net model and the UNet++ model, the improved UNet++ model makes the segmented tree rings complete and continuous, although most tree rings are cut by cracks and wormholes and cannot form a complete circular closed curve, fracture and noise are eliminated. The results show that the improved UNet++ model is not affected by defects such as cracks, knots and wormholes, and the segmentation results are very clear, which effectively solves mis-segmentation and under segmentation of dense tree rings under the interference of wormhole defects.
    Reference | Related Articles | Metrics
    Facial Expression Generation Based on Group Residual Block Generative Adversarial Netxwork
    LIN Benwang, ZHAO Guangzhe, WANG Xueping, LI Hao
    Computer Engineering and Applications    2024, 60 (5): 240-249.   DOI: 10.3778/j.issn.1002-8331.2210-0234
    Abstract31)      PDF(pc) (983KB)(31)       Save
    Facial expression generation is the generation of facial images with expressions through a certain expression calculation method, which is widely used in face editing, film and television production, and data augmentation. With the advent of generative adversarial network (GAN), facial expression generation has made significant progress, but problems such as overlapping, blurring, and lack of realism still occur in facial expression generation images. In order to address the above issues, group residuals with attention mechanism generative adversarial network (GRA-GAN) is proposed to generate high-quality facial expressions. Firstly, an adaptive mixed attention mechanism (MAT) is embedded in the generative network before downsampling and after upsampling to adaptively learn the key region features and enhance the learning of key regions of the image. Secondly, the idea of grouping is integrated into the residual network, and the group residuals block with attention mechanism (GRA) module is proposed to achieve better generation effect. Finally, the experimental verification is carried out on the public dataset RaFD. The experimental results show that the proposed GRA-GAN outperforms the related methods in both qualitative and quantitative analysis.
    Reference | Related Articles | Metrics
    Improving YOLOX-s Dense Garbage Detection Method
    XIE Ruobing, LI Maojun, LI Yiwei, HU Jianwen
    Computer Engineering and Applications    2024, 60 (5): 250-258.   DOI: 10.3778/j.issn.1002-8331.2210-0235
    Abstract48)      PDF(pc) (837KB)(47)       Save
    To address the problems of low recognition rate, inaccurate localization and false detection and omission of targets to be detected in densely stacked multi-species garbage detection, a garbage detection method in corporating multi-headed self-attention mechanism to improve YOLOX-s is proposed. Firstly, the Swin Transformer module is embedded in the feature extraction network, and the multi-headed self-attention mechanism based on the sliding window operation is introduced to make the network take into account the global feature information and the key feature information to reduce the false detection phenomenon. Secondly, the deformable convolution is used in the prediction output network to refine the initial prediction frame and improve the localization accuracy. Finally, on the basis of the EIoU, loss weighting coefficients are introduced to propose a weighted IoU-EIoU loss, which adaptively adjusts the degree of concern for different losses at different stages of training to further accelerate the convergence of the training network. Testing on a public 204-class spam detection dataset, the results show that the average mean accuracy of the propose improve algorithm can reach 80.5% and 92.5%, respectively, which is better than the current popular target detection algorithms, and the detection speed is fast to meet the real-time requirements.
    Reference | Related Articles | Metrics
    Ship Target Detection Method Combining Visual Saliency and EfficientNetV2
    LIANG Xiuya, FENG Shuichun, CHEN Hongzhen
    Computer Engineering and Applications    2024, 60 (5): 259-270.   DOI: 10.3778/j.issn.1002-8331.2210-0267
    Abstract49)      PDF(pc) (1320KB)(42)       Save
    With the increasing resolution of optical remote sensing images, fast and accurate detection of ship targets on the sea has become one of the basic challenges of maritime research. In order to solve the problems faced in the detection process, such as large image size but sparse targets, complex background interference, poor timeliness of target extraction, and large calculation of model volume, a practical ship detection scheme is proposed. Visual saliency is introduced to effectively accelerate the pre-screening process, and the difference between the ship target area and the background is effectively expressed by wavelet decomposition coefficients, which can enhance the target directional characteristics while suppressing noise. Saliency map is generated through the improved model based on phase spectrum of quaternion Fourier transform (PQFT). In addition, Gini index is exploited to guide multi-scale saliency image fusion to enhance image scale adaptability and small target saliency. Comparing with other saliency methods, the proposes model can effectively suppress the interference of complex environments such as cloud, fog, sea clutter, and ship wake. More importantly, it produces a smaller set of candidate regions than the classical sliding window or other region recommendation methods. After the saliency map is obtained, the adaptive threshold OTSU method is employed for binary segmentation of saliency map. In the target discrimination stage, the lightweight network EfficientNetV2 is exploited to effectively eliminate false alarms. The experimental results show that the proposes ship detection method has high robustness and accuracy up to 96%, meeting the real-time requirements.
    Reference | Related Articles | Metrics
    Dual-Branch Low-Light Image Enhancement Combined with Dense Wavelet Transform
    CHEN Junjie, ZHOU Yongxia, ZU Jiazhen, SHEN Wei, ZHAO Ping
    Computer Engineering and Applications    2024, 60 (4): 200-210.   DOI: 10.3778/j.issn.1002-8331.2209-0470
    Abstract41)      PDF(pc) (3662KB)(41)       Save
    A dual-branch image enhancement method combining dense wavelet transform is proposed to solve the problems of low brightness, high noise, and color distortion in low-light images. Firstly, dense wavelet networks are used for multi-scale feature information fusion to reduce information loss and provide denoising capability. Then, the global attention module and feature extraction module are embedded in the multi-scale feature fusion to fully extract global and local features. Finally, the effect of low-light images is enhanced by color enhancement and detail reconstruction with a dual-branch structure. In addition, a new joint loss function is introduced to guide the network training from multiple aspects to enhance its performance. The experimental results show that the proposed method effectively improves the brightness of low-light images, suppresses image noise, and obtains richer details and color information. The enhanced images are clearer and more natural, and the peak signal-to-noise ratio and structural similarity have significant advantages over the mainstream methods.
    Reference | Related Articles | Metrics
    Vectorized Feature Space Embedded Clustering Based on Contrastive Learning
    ZHENG Yang, WU Yongming, XU An
    Computer Engineering and Applications    2024, 60 (4): 211-219.   DOI: 10.3778/j.issn.1002-8331.2209-0338
    Abstract41)      PDF(pc) (2323KB)(37)       Save
    The deep embedding clustering (DEC) algorithm only embeds data into a low-dimensional vectorized feature space by autoencoder with a single instance reconstruction for clustering, and ignores the relationship between different instances, which leads to the instances in the embedding space may not be well distinguished from each other. To address the above problems, vectorized feature space embedded clustering based on contrastive learning (VECCL) method is proposed. By contrastive learning to identify the dissimilarity between data instances in a way, features with homogeneous near and different far clustering semantics are extracted from the data and brought into DEC as prior knowledge to guide the autoencoder to initialize a low-dimensional clustering feature space with deep data information. At the same time, the entropy loss constructed by the soft classification label and the reconstruction loss of the autoencoder are introduced into the clustering loss function as a regularization term to jointly refine the clustering. Compared with the experimental results of DEC method on datasets CIFAR10, CIFAR100 and STL10, ACC increaseds by 48.1, 23.1 and 41.8 percentage points, NMI increaseds by 41.0, 25.2 and 39.0 percentage points, and ARI increaseds by 45.4, 16.4 and 41.8 percentage points, respectively.
    Reference | Related Articles | Metrics
    Moving Object Detection Algorithm with Unsupervised Missing Value Prediction
    FU Rao, FANG Jiandong, ZHAO Yudong
    Computer Engineering and Applications    2024, 60 (4): 220-228.   DOI: 10.3778/j.issn.1002-8331.2210-0027
    Abstract56)      PDF(pc) (2464KB)(44)       Save
    In the process of moving target detection, the background is complex and the target is easily occluded. This paper proposes an autonomous detection algorithm for moving targets based on unsupervised missing value prediction. The missed targets are regarded as missing values in the tag data. According to the prior knowledge of the category and number of objects to be detected, the unsupervised generative adversarial imputation networks (GAIN) are used to predict the missing values through the acquired tag data, which greatly improves the recall rate at the expense of less accuracy. The experimental results on the small sample dataset of the characteristic parts of cattle show when the missing rate of tag data is less than 40%, the accuracy of missing value prediction is about 95%, and the average F1 score of detection is 0.92 for different degrees of occluded targets. This method has good detection performance for moving targets under the condition of small samples, which can reduce the uncertainty in practical application and the dependence of the algorithm on sample data, and improve the problem of missing detection in the process of moving target detection.
    Reference | Related Articles | Metrics
    Multi-Scale Detail Enhanced Pyramid Network for Esophageal Lesion Detection
    LI Chi, ZHOU Yingyue, YAO Hanmin, LI Xiaoxia, QIN Jiamin, ZHUANG Ming, WEN Liming
    Computer Engineering and Applications    2024, 60 (4): 229-236.   DOI: 10.3778/j.issn.1002-8331.2209-0162
    Abstract41)      PDF(pc) (2317KB)(34)       Save
    Aiming at problems such as high interclass similarity and large intraclass scale changes in Lugol??s chromoendoscopy (LCE) images, this paper proposes a method for the detection of multiple esophageal diseases, which is based on Sparse R-CNN and equipped with multi-scale detail enhancement pyramid network (MDEPN) structure. In order to solve the problems of information loss and semantic difference in feature pyramid network (FPN) structure of Sparse R-CNN, the MDEPN structure firstly uses Gabor modulated convolution module to enhance the features of different scales, and uses Gabor??s strong attention to direction and scale to improve the expression ability of texture information in the feature map. Secondly, the directional channel pooling module is used to extract the local directional similarity and the correlation between local and global features of different scale features, so as to reduce the semantic differences in the fusion of different scale features. After testing on a self-built dataset of multiple esophageal LCE lesions, the accuracy of mAP0.50 is 65.0%, 2.4 percentage points  higher than that of the benchmark model Sparse R-CNN, and higher than other major detection methods. In addition, the designed MDEPN module can be integrated into other detection models as an independent structure to improve performance, and has certain versatility.
    Reference | Related Articles | Metrics
    Photovoltaic Panel Segmentation Using Attention Mechanism and Global Convolution
    LI Qing, LI Haitao, LI Hui, ZHANG Junhu
    Computer Engineering and Applications    2024, 60 (4): 237-248.   DOI: 10.3778/j.issn.1002-8331.2209-0180
    Abstract35)      PDF(pc) (2931KB)(42)       Save
    Accurate photovoltaic (PV) identification is critical for the effective and healthy development of PV industry. PV recognition is hampered by the complex background and variable shape and color of PV panels in high-resolution remote sensing images. This paper proposes a method for accurately extracting photovoltaic land from high-resolution remote sensing images. The encoder and decoder functions in this network combine multi-layer features to combine rich semantic data. Important spatial and channel properties are captured using the global convolution and the dual attention mechanism, while some lost channel data are recovered using the channel fusion module. The proposed method can effectively solve the problems of photovoltaic panel blurred edges and adhesion. Experiments on open PV datasets show that the IoU of the proposed method in PV01, PV03, and PV08 is 87.02%, 92.98%, and 88.43%, respectively, when compared to U-Net, SegNet, DeepLabv3, and DeepLabv3+. Experimental results show that the proposed method can achieve high accuracy segmentation of photovoltaic panels in high-resolution remote sensing images.
    Reference | Related Articles | Metrics
    Aerial Image Object Detection with Feature Enhancement Using Hybrid Attention
    GUAN Wenqing, ZHOU Shibin, ZHANG Guopeng
    Computer Engineering and Applications    2024, 60 (4): 249-257.   DOI: 10.3778/j.issn.1002-8331.2209-0206
    Abstract77)      PDF(pc) (3037KB)(87)       Save
    Aiming at the characteristics of complex background, dense distribution and large scale variation in aerial images, this paper proposes a novel object detection framework named as hybrid attention network (HA-Net). Firstly, Transformer structure both with local and global attention in the backbone network is designed to enhance dense targets feature extraction ability. The Transformer structure uses attention to suppress background noises and make dense target boundaries clearer. Then, a spatial pyramid pooling block using continuous AvgPooling and MaxPooling is adopted to enrich feature information and enhance the multi-scale target representation. Moreover, a feature reconstruction module mixing cross-scale spatial attention and non-local channel attention is designed to reconstruct the feature pyramid network, so as to reduce unnecessary information interference and facilitate multi-scale target detection. The network is evaluated on a large remote sensing dataset DOTA, and the evaluation mAP reaches 76.81% and 78.28% on single-scale test and multi-scale test respectively, which surpasses the baseline model by a large margin of 2.38 percentage points and 3.62 percentage points. The evaluation mAP reaches 89.95% on HRSC2016. The improvement of detection results proves the effectiveness of HA-Net in aerial image object detection.
    Reference | Related Articles | Metrics
    Unsupervised Landscape Painting Style Transfer Network with Multiscale Semantic Information
    ZHOU Yuechuan, ZHANG Jianxun, DONG Wenxin, GAO Linfeng, NI Jinyuan
    Computer Engineering and Applications    2024, 60 (4): 258-269.   DOI: 10.3778/j.issn.1002-8331.2210-0079
    Abstract34)      PDF(pc) (3904KB)(27)       Save
    This paper proposes CCME-GAN (circulatory correction multiscale evaluation-generative adversarial networks) based on the cycle consistency loss, aiming at the problems of texture clutter and poor quality of generated images when the generative adversarial network of image conversion class is dealing with the task of unsupervised style transfer. Firstly, in the design of the network architecture, a multiscale evaluation network architecture based on the three-layer semantic information of images is proposed to enhance the transfer effect from the source domain to the target domain. Secondly, in the improvement of the loss function, a multiscale adversarial loss and a cyclic correction loss are proposed to guide the optimization iteration direction of the model with a stricter target, and generate pictures with better visual quality. Finally, in order to prevent the problem of pattern collapse, this paper adds an attention mechanism in the encoding stage of style features to extract important feature information, and then introduces the ACON activation function in each stage of the network to strengthen the nonlinear expression ability of the network and avoid neuron necrosis. The experimental results show that the FID value of this paper method is reduced by 21.80% and 34.33% compared with CycleGAN and ACL-GAN on the landscape painting style migration dataset. In addition, in order to verify the generalization ability of the model, the generalization experiments are compared on two public datasets, Vangogh2Photo, and Monet2Photo and the FID values are decreased by 7.58%, 18.14% and 4.65%, 6.99% respectively.
    Reference | Related Articles | Metrics
    Multi-Scale Liver Tumor Segmentation Algorithm by Fusing Convolution and Transformer
    CHEN Lifang, LUO Shiyong
    Computer Engineering and Applications    2024, 60 (4): 270-279.   DOI: 10.3778/j.issn.1002-8331.2210-0084
    Abstract84)      PDF(pc) (2544KB)(62)       Save
    Accurate automatic segmentation methods for liver and liver tumors are important in helping physicians to diagnose, treat, and observe liver cancer in the postoperative period. Due to the intrinsic locality of convolution, existing convolution-based methods are difficult to establish long-range dependencies. Transformer??s cascading attention mechanism can establish global information association but will destroy local details. Based on this, a feature modeling method that fuses convolution and Transformer is proposed. The method interactively fuses local and global representations by mixed embedding to maximize the global dependencies at different resolutions. Meanwhile, the contextual information from different encoding stages is captured by multi-level feature fusion module at the skip connection to obtain richer semantic information. Finally, in order to cope with the variation of liver tumors in size and shape, a deformable multi-scale module is used to extract multi-scale features of tumors. The experiments mainly use Dice similarity coefficient (DSC) as evaluation metrics. The DSCs of liver and tumor on the LiTS17 dataset are 0.920 and 0.748, respectively, and the results show that the proposed network has more accurate liver tumor segmentation results compared to the baseline.
    Reference | Related Articles | Metrics
    Underwater Image Enhancement Based on Parallel Guidance of Transformer and CNN
    CHANG Jian, CHEN Hongfu, WANG Bingbing
    Computer Engineering and Applications    2024, 60 (4): 280-288.   DOI: 10.3778/j.issn.1002-8331.2302-0036
    Abstract47)      PDF(pc) (2540KB)(39)       Save
    To overcome the problems of low contrast and color deviation in underwater images, a parallel guided underwater image enhancement algorithm based on Transformer and convolutional neural networks (CNN) is proposed. Using a 3D position embedding model to provide Transformer with relative position information, color deviation information, and global features of feature maps, using a CNN encoder to extract local features of the image, integrating the global features extracted by Transformer and the local features extracted by CNN through a feature modulation matrix, improving the resolution of the image through a CNN decoder, and inputting the feature maps output by the decoder into a feature enhancement network, enhance the network output with features to obtain the final result. Using the existing EUVP paired dataset for training, to verify the superiority of the algorithm, underwater images with varying degrees of color deviation are selected for qualitative and quantitative experiments. The results show that the enhanced underwater image peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM) are higher than other comparison algorithms, and the subjective quality is significantly improved, the proposed algorithm can generate enhanced images with rich colors and high clarity.
    Reference | Related Articles | Metrics
    Improved YOLOv5 Mixed Sample Training for Detection of Insulator Umbrella Plate Falling Defects
    LI Xun, GAN Rundong, QIAN Junfeng, ZHANG Shiheng, ZHAO Wenbin, WANG Daolei
    Computer Engineering and Applications    2024, 60 (4): 289-297.   DOI: 10.3778/j.issn.1002-8331.2302-0165
    Abstract57)      PDF(pc) (2478KB)(49)       Save
    In order to realize the accurate location and identification of insulator string and umbrella plate falling defects during transmission line inspection, this paper proposes an insulator defect detection model based on improved YOLOv5 mixed sample training. Firstly, aiming at the scarcity of insulator defect images, a hybrid sample data generation method is proposed, which combines GrabCut algorithm with image fusion technology to expand the data set. Then, according to the shape characteristics of insulators and defects, the long edge definition method and CSL (circular smooth label) are used to redefine the coordinate parameters of the model feature extraction area. By adding angle information, more accurate feature extraction is realized. Finally, the CSPDarkNet backbone network is optimized by fusing some feature layers in the Backbone with the features extracted by PAN (path aggregation network). The improved YOLOv5 CSPDarkNet model increases the detection accuracy of insulator defects by 2.8 percentage points compared with the improved model, and the detection rate is 20.5 FPS. The experimental results show that the improved insulator defect identification method basically meets the needs of practical application.
    Reference | Related Articles | Metrics
    Class Incremental Learning by Adaptive Feature Consolidation with Parameter Optimization
    XU An, WU Yongming, ZHENG Yang
    Computer Engineering and Applications    2024, 60 (3): 220-227.   DOI: 10.3778/j.issn.1002-8331.2208-0469
    Abstract35)      PDF(pc) (719KB)(32)       Save
    Aiming at the catastrophic forgetting problem generated by deep network models for picture classification tasks in incremental scenarios, a class incremental learning method with adaptive feature consolidation and weight selection is proposed. Firstly, the method uses knowledge distillation as the basic framework to integrate the output features of the backbone and classification networks of the before and after task models, and uses distillation constraints with custom disparity loss to make the current model have the generalization ability of the old historical model. In the incremental learning phase, the importance of the neural network model parameters is evaluated, while changes in important parameters are penalized when learning a new task, thus effectively preventing the new model from overwriting important knowledge related to the previous task. The experimental results show that the proposed method can explore the incremental learning ability of the model and effectively alleviate catastrophic forgetting.
    Reference | Related Articles | Metrics
    Improved FCENet Algorithm for Natural Scene Text Detection
    ZHOU Yan, LIAO Junwei, LIU Xiangyu, ZHOU Yuexia, ZENG Fanzhi
    Computer Engineering and Applications    2024, 60 (3): 228-236.   DOI: 10.3778/j.issn.1002-8331.2209-0043
    Abstract31)      PDF(pc) (877KB)(26)       Save
    Aiming at the detection problems caused by complex background, variable scale and curved shape in natural scene text detection, this paper proposes an improved FCENet (Fourier contour embedding network) scene text detection algorithm. The algorithm is based on FCENet and introduces a multi-scale residual feature enhancement module and a multi-scale attention feature fusion module. As the residual branch at the top of the backbone network, the multi-scale residual feature enhancement module enhances the high-level semantic information flow from top to bottom of the feature pyramid structure, improves the text pixel classification ability, and effectively reduces the false detection phenomenon. The multi-scale attention feature fusion module enables features of different semantics and scales to be better fused. Combined with the bottom-up feature fusion network, it effectively avoids text over-segmentation and improves the detection ability of curved text. Experimental results show that the comprehensive index F-measure of the proposed method on the curved text datasets CTW1500 and Total-Text reaches 86.2% and 86.5%, respectively, which is 1.1 and 0.7 percentage points higher than the original algorithm FCENet.
    Reference | Related Articles | Metrics