Computer Engineering and Applications

Select

Improved YOLOv7 for UAV Image Object Detection

ZOU Zhentao, LI Zeping

Computer Engineering and Applications 2024, 60 (8): 173-181. DOI: 10.3778/j.issn.1002-8331.2305-0264

Abstract （23）

PDF（pc）（745KB）（21）

Save

Aerial image target detection has significant practical implications for efficient interpretation of aerial images and applications in mapping, resource inventory, urban and rural planning, etc. To address challenges in UAV aerial images, such as varying object scales, background interference, and missing detection of small targets, propose an improved algorithm called AirYOLOv7, based on YOLOv7. Firstly, AirYOLOv7 combines a three-dimensional attention mechanism during feature extraction and a channel attention mechanism during feature fusion in the original network. These mechanisms help the model focus on crucial information in the image. Secondly, because of the prevalence of small objects in aerial images, the algorithm adds an additional prediction head for detecting small objects. The algorithm also incorporates the C3STB before each prediction head to improve detection capability for objects of different scales. Additionally, the algorithm addresses the sensitivity of the IoU loss to positional deviations for small objects by introducing the Wasserstein distance into the original bounding box regression loss. This measure helps improve the detection capability for small objects. Experimental results demonstrate that the effectiveness of AirYOLOv7 on two publicly available optical aerial datasets, DOTA and VisDrone achieves mean average precision of 78.65% and 51.79% on these datasets, respectively, showing improvements of 1.92 percentage points and 2.28 percentage points comparing to the original YOLOv7 which validates the effectiveness of the proposed improvements on optical aerial images.

Reference | Related Articles | Metrics

Select

Improved YOLOv8 Lightweight UAV Target Detection Algorithm

HU Junfeng, LI Baicong, ZHU Hao, HUANG Xiaowen

Computer Engineering and Applications 2024, 60 (8): 182-191. DOI: 10.3778/j.issn.1002-8331.2310-0063

Abstract （19）

PDF（pc）（813KB）（15）

Save

Aiming at the problem that UAV target detection algorithms are computationally complex and difficult to deploy, and the long-tailed distribution of UAV data leads to low detection accuracy, a lightweight UAV target detection algorithm based on improved YOLOv8 (PC-YOLOv8-n) is proposed, which can balance the network detection accuracy and computation, and has some generalisation ability to long-tailed distribution of data. Using partial convolutional layers (PConv) to replace the 3×3 convolutional layers in YOLOv8, the network is lightweighted to solve the problems of network redundancy and computational complexity; it fuses dual-channel feature pyramids, increases top-down paths, fusion of deep and shallow information, and introduces a lightweight attention mechanism in the same layer to improve the feature extraction ability of the network; it uses the equilibrium focus loss (EFL) as the category loss function to increase the category detection ability of the network by equalising the gradient weights of the tail categories during network training. The experimental results show that PC-YOLOv8-n has good performance in the VisDrone2019 dataset, improving 1.6 percentage points in mAP50 accuracy over the original YOLOv8-n algorithm, while the parameters and computation of the model are reduced to 2.6×106 and 7.6 GFLOPs, respectively, and the detection speed reaches 77.2 FPS.

Reference | Related Articles | Metrics

Select

Lightweight YOLO-v7 for Digital Instrumentation Detection and Reading

ZHANG Ruining, YAN Kun, YE Jin

Computer Engineering and Applications 2024, 60 (8): 192-201. DOI: 10.3778/j.issn.1002-8331.2304-0401

Abstract （12）

PDF（pc）（712KB）（13）

Save

Due to the large parameter volume and high computational complexity, it is difficult to deploy generic detection and recognition models directly on mobile. To address this difficulty, a method for instrument detection and reading using computer vision on mobile devices is investigated. A lightweight meter detection network and a character detection and recognition network are redesigned based on YOLO-v7 to address the needs of detection and recognition in real industrial production environments. The depth-separable convolution is then used to further reduce the computational complexity and compress the model size. Then a K-means++ clustering algorithm plus a genetic algorithm is used to automatically generate the initial anchor box. Finally, channel pruning is used to compress the model once more. The experimental results demonstrate that the dedicated network model design, deep separable convolution and channel pruning have a significant effect on reducing the size of the model parameters and reducing the computational power requirements. The numbers of parameters are both decreased by 99.67% compared to the original YOLO-v7 model, and the model arithmetic requirements are both reduced to 0.3?GFLOPs, a decrease of 99.71%. The average image detection time in the experiments equals to 10.7?ms. The average accuracy (mAP0.5) of each network reaches 99.63% and 99.53%. The overall system reading accuracy reaches 98.44%.

Reference | Related Articles | Metrics

Select

Improved YOLOv8 Object Detection Algorithm for Traffic Sign Target

TIAN Peng, MAO Li

Computer Engineering and Applications 2024, 60 (8): 202-212. DOI: 10.3778/j.issn.1002-8331.2309-0415

Abstract （25）

PDF（pc）（937KB）（23）

Save

Although the current testing technology is becoming increasingly mature, the detection of small targets in complex environments is still the most difficult point in research. Aiming at the problem of high target proportion of traffic signs in road traffic scenarios, the problem of high target proportion of small targets and large environmental interference factors, it proposes a type of road traffic logo target test algorithm based on YOLOv8 improvement. Due to the prone to missed inspection in small target testing, the bi-level routing attention (BRA) attention mechanism is used to improve the network’s perception of small targets. In addition, it also uses a shape-changing convolutional module deformable convolution V3 (DCNV3). It has a better feature extraction ability for irregular shapes in the feature map, so that the backbone network can better adapt to irregular space structures, and pay more accurately to important attention，objectives, thereby improving the detection ability of the model to block the overlapping target. Both DCNV3 and BRA modules improve the accuracy of the model without increasing the weight of the model. At the same time, the Inner-IOU loss function based on auxiliary border is introduced. On the four data sets of RoadSign, CCTSDB, TSDD, and GTSDB, small sample training, large sample training, single target detection, and multi-target detection are performed. The experimental results are improved. Among them, the experiments on the RoadSign data set are the best. The average accuracy of the improved YOLOv8 model mAP50 and mAP50：95 reaches 90.7% and 75.1%, respectively. Compared with the baseline model, mAP50 and mAP50：95 have increased by 5.9 and 4.8 percentage points, respectively. The experimental results show that the improved YOLOV8 model effectively implements the traffic symbol detection in complex road scenarios.

Reference | Related Articles | Metrics

Select

Siamese Networks for Object Tracking on Statistical Characteristics of Distributions

LI Jun, CAO Lin, ZHANG Fan, DU Kangning, GUO Yanan

Computer Engineering and Applications 2024, 60 (8): 213-224. DOI: 10.3778/j.issn.1002-8331.2211-0435

Abstract （11）

PDF（pc）（854KB）（6）

Save

Although siamese trackers have achieved great success, the tracking performance is inferior in complex scenes such as ambiguous boundaries. Most of the existing methods use the inflexible Dirac distribution for target localization. Due to the lack of uncertainty estimation of the bounding box, the target cannot be accurately located under the ambiguous boundaries. For this purpose, this paper improves SiamBAN. Firstly, the representation of the bounding box is changed from Dirac distribution to the general distribution within a certain range with the help of the characteristics that distribution statistics of the bounding box are highly correlated with the actual localization quality. Secondly, a higher localization quality estimation score is generated by putting the distribution statistics into distribution guided quality predictor. Finally, classification and localization quality estimation are represented jointly which can overcome the problem of inconsistency between classification and localization in training and testing stages. Extensive experiments on visual tracking datasets including VOT2018, VOT2019, OTB100, UAV123, LaSOT, TrackingNet, and GOT-10k demonstrate that the performance of proposed method surpass SiamBAN by 3.3%~10% in terms of accuracy and EAO.

Reference | Related Articles | Metrics

Select

Lightweight Semi-Supervised Semantic Segmentation Algorithm Based on Dual-Polarization Self-Attention

MA Dongmei, LI Yueyuan, CHEN Xi

Computer Engineering and Applications 2024, 60 (8): 225-233. DOI: 10.3778/j.issn.1002-8331.2211-0439

Abstract （17）

PDF（pc）（631KB）（13）

Save

Aiming at the problems of high complexity, low training accuracy and large number of parameters of the current semi-supervised semantic segmentation method, a lightweight semi-supervised semantic segmentation algorithm integrating the dual-polarization self-attention mechanism is proposed. Firstly, the model uses the Resnet-101 residual network constructed by location-aware cyclic convolution as the segmentation backbone network to extract deep features. Secondly, the dual-polarization self-attention mechanism of channel and space is integrated to maintain high internal resolution in polarization channel and spatial attention branch. Finally, position-aware cyclic convolution is combined with channel attention operation to improve segmentation accuracy, reduce computing cost, and overcome problems such as hardware support. The experimental results on the public dataset PASCAL VOC 2012 show that the average intersection union ratio of the algorithm can reach 76.32%, which is 2.52?percentage points higher than the benchmark model accuracy, the number of parameters is reduced by 9%, and the memory occupied by the model hardware is reduced by 61.6%. Compared with the latest algorithms in the field, the model designed in this paper shows significant advantages in terms of accuracy, model complexity, and parameter quantity.

Reference | Related Articles | Metrics

Select

Human Detection of Damage Behavior for Vending Cabinets Based on Improved YOLOv4-Tiny

YIN Min, JIA Xinchun, ZHANG Xueli, FENG Jiangtao, FAN Xiaoyu

Computer Engineering and Applications 2024, 60 (8): 234-241. DOI: 10.3778/j.issn.1002-8331.2212-0057

Abstract （12）

PDF（pc）（778KB）（11）

Save

The safety inspection of unmanned containers has always been a hot topic in the retail field. Aiming at the problem that the existing manual monitoring cannot timely and effectively capture the damage behavior of some consumers to the self-service vending cabinet and its internal products, an improved YOLOv4-Tiny oriented human detection method of damage behavior for vending cabinets is proposed. First of all, the surveillance video collected in the real scene is preprocessed, and the production of the data set DMGE-Act is completed to solve the problem of insufficient scene image data sources. Then, an improved model based on YOLOv4-Tiny, YOLOv4-TinyX, is proposed. By modifying the activation function of the neural network for smooth approximation, CBAM is introduced after the largest feature extraction layer of the backbone feature extraction network, and in strengthening the feature extraction network. After the upsampling operation layer, two different attention mechanism modules of CA are introduced, and the data imbalance is corrected, which effectively improves the feature extraction and detection capabilities of the algorithm. Through comparative experimental analysis, the improved model parameters are only increased by 2×104, while the average precision mAP is increased by 10.29 percentage points. The results show that the algorithm remains lightweight and the detection accuracy of damage behavior is significantly improved.

Reference | Related Articles | Metrics

Select

Improved Tracktor-Based Pedestrian Multi-Objective Tracking Algorithm

SHEN Haiyun, HUANG Zhongyi, WANG Haichuan, YU Honghao

Computer Engineering and Applications 2024, 60 (8): 242-249. DOI: 10.3778/j.issn.1002-8331.2212-0096

Abstract （11）

PDF（pc）（765KB）（8）

Save

In multi-target video tracking, for the problem of detection bias caused by interaction occlusion and other influences, thus resulting in target identity loss, an improved Tracktor-based pedestrian multi-target tracking algorithm is proposed. Firstly, a dynamic update module is designed in the detection frame regression to further detect and locate the proposed frame by using twin networks. Then, the temporal information enhancement module is used to update a more suitable template for current frame and establish global contextual relationships. And feature fusion is performed through pixel correlation, thus enhancing target edge information and scale information. Finally, camera motion compensation and fusion similarity matrix are adopted to construct a secondary correlation tracking mechanism to establish stronger correlation between detection frame and trajectory, and improve the robustness of target tracking. Experimental tests are conducted on public available MOT16 dataset, in comparasion with current mainstream algorithms, the tracking accuracy performance of the proposed algorithm is better,which has good robustness with a stable FPS of 24 frames.

Reference | Related Articles | Metrics

Select

Gesture Recognition of Traffic Police Based on Spatio-Temporal Feature Fusion

DU Bing, ZHAO Ji

Computer Engineering and Applications 2024, 60 (8): 250-257. DOI: 10.3778/j.issn.1002-8331.2212-0110

Abstract （15）

PDF（pc）（740KB）（9）

Save

In recent years, with the development of human pose estimation technology, gesture recognition technology based on skeleton key points comes into being. This paper proposes a GCPM-AGRU model for gesture recognition of traffic police. In order to locate the key points of human body more accurately, the convolution pose machine (CPM) is improved. Firstly, the idea of residuals, channel split and channel shuffle are added to the feature extraction module, so that the designed feature extraction module can better extract image features. In addition, the parallel multi-branch Inception4d structure is added in the first stage of CPM, which makes the CPM network have the idea of multi-scale feature fusion, and effectively improves the problem of human key point location. Secondly, a GRU based on attention mechanism is proposed, which allocates different weights to each frame to achieve different degrees of attention to each frame, so as to obtain better time information. Finally, it combines the spatio-temporal feature information to carry out traffic police gesture recognition. The accuracy of traffic police gesture recognition reaches 93.7%, which is 2.95 percentage points higher than before the improvement of network.

Reference | Related Articles | Metrics

Select

ConvUCaps: Medical Image Segmentation Model Based on Convolutional Capsule Network

DENG Xiquan, CHEN Gang

Computer Engineering and Applications 2024, 60 (8): 258-266. DOI: 10.3778/j.issn.1002-8331.2302-0002

Abstract （8）

PDF（pc）（679KB）（7）

Save

In the field of medical image segmentation, U-Net network is one of the most successful and concerned methods at present. However, U-Net is essentially a modified fully convolutional neural network model. To obtain a more comprehensive and accurate local-whole relationship, it is not only necessary to increase the network level which increases the amount of calculation, but also the effect is not obvious. Capsule network provides an effective method to model the local and whole relationship of images, which can achieve good performance with fewer parameters. However, the original capsule network does not fully consider the granularity of local features of the image, and its application in the field of medical image segmentation needs further improvement. Therefore, this paper proposes a medical image segmentation model named ConvUCaps, which combines U-Net and capsule network. This model improves the encoder part of U-Net, uses convolutional module to learn local features of different scales, and then uses capsule module to learn high-level features and model the local-whole relationship. The experimental results show that compared with U-Net, UNet++, SegCaps and Matwo-CapsNet networks, ConvUCaps improves the segmentation accuracy and convergence speed. At the same time, the inference time is significantly reduced compared with the segmentation model based solely on capsule network.

Reference | Related Articles | Metrics

Select

Collaborative Correction Technology of Label Omission in Dataset for Object Detection

ZHOU Dingwei, HU Jing, ZHANG Liangrui, DUAN Feiya

Computer Engineering and Applications 2024, 60 (8): 267-273. DOI: 10.3778/j.issn.1002-8331.2302-0056

Abstract （8）

PDF（pc）（663KB）（7）

Save

For the label omission caused by fatigue, carelessness and other factors in image labeling, it is difficult to correctly distinguish positive and negative samples during model training, thus affecting the performance of the model. A collaborative correction technology is designed to update the training set through multiple rounds of iteration, erase the potential unlabeled object, reduce the error monitoring information of the training set, and avoid manual repeated inspection and labeling. This method does not need to adjust the algorithm parameters, does not depend on the specific network structure, and reduces the dataset errors at low cost to improve the model training accuracy. Based on the experiment of YOLOv5 algorithm, it is shown that the cooperative correction operation can improve the detection accuracy by 0.4%~1.4% on multiple common datasets after only one iteration, and it still takes effect when the label omission rate in the dataset reaches 40%. This method has no limit on the amount of data and the number of categories of samples in the dataset, and can be applied to multiple target detection scenarios such as e-commerce, remote sensing, general purpose, etc., maintaining good robustness and generalization.

Reference | Related Articles | Metrics

Select

Lightweight Traffic Monitoring Object Detection Algorithm Based on Improved YOLOX

HU Weichao, GUO Yuyang, ZHANG Qi, CHEN Yanyan

Computer Engineering and Applications 2024, 60 (7): 167-174. DOI: 10.3778/j.issn.1002-8331.2308-0081

Abstract （84）

PDF（pc）（630KB）（109）

Save

Traffic target detection technology is an important tool for traffic management departments in key tasks such as traffic monitoring and safety surveillance. Faced with the large amount of traffic monitoring scene data, there is a need to employ traffic target detection techniques that offer fast detection speed, high accuracy and low computational resource utilization. To meet this need, this paper proposes a lightweight traffic target detection algorithm PL-YOLO for traffic monitoring scenes based on the YOLOX algorithm and the PP-LCNet network. Furthermore, considering the dense distribution and small size of vehicles in traffic monitoring scenes, the SimAM attention mechanism module is added to focus on more meaningful features. Experimental results demonstrate that PL-YOLO achieves 1.89 percentage points increase in detection accuracy, the model size decreases by 54% and the FPS increases from 20.88 frame/s to 26.68 frame/s compared to the YOLOX-s model.

Reference | Related Articles | Metrics

Select

Improved YOLOv7 Algorithm for Wood Surface Defect Detection

JIANG Xingwang, ZHAO Xingqiang

Computer Engineering and Applications 2024, 60 (7): 175-182. DOI: 10.3778/j.issn.1002-8331.2309-0185

Abstract （71）

PDF（pc）（599KB）（88）

Save

High quality wood is deeply loved by people, but it has various defects that lead to low yield and low utilization rate of high-quality wood. The use of deep learning object detection algorithms can achieve rapid and stable detection of wood surface defects, thereby improving the quality and utilization of wood. A wood surface defect detection model YOLOv7-ESS based on improved YOLOv7 is proposed to address the problem of poor detection accuracy caused by the small, dense, and complex target size of wood surface defects. Firstly, in response to the issue of extreme aspect ratio affecting the detection effect of wood crack defects, an attention module ECBAM is embedded to enhance the model’s feature extraction ability by enhancing attention to extreme aspect ratio defects. Secondly, in response to the problem of severe loss of feature information for small defects on the wood surface during feature extraction, a shallow weighted feature fusion network SFPN is introduced, which uses deep feature maps as output and effectively utilizes shallow feature information to improve the recognition accuracy of small defects. Finally, the SIoU loss function is introduced to improve the convergence speed and accuracy of the model. The results show that the average detection accuracy of the YOLOv7-ESS model is 94.7%, which is 11.2 percentage points higher than YOLOv7 and meets the defect detection requirements for wood production and processing.

Reference | Related Articles | Metrics

Select

DY-YOLOv5：Target Detection for Aerial Image Based on Multiple Attention

ZHAO Xin, CHEN Lili, YANG Weichuan, ZHANG Chengwang

Computer Engineering and Applications 2024, 60 (7): 183-191. DOI: 10.3778/j.issn.1002-8331.2309-0419

Abstract （89）

PDF（pc）（1074KB）（93）

Save

Aiming at the problem of low detection accuracy caused by small targets, different scales and complex backgrounds in UAV aerial images, a target detection algorithm for UAV aerial images based on improved YOLOv5 is proposed. The algorithm introduces a target detection head method Dynamic Head with multiple attention mechanisms to replace the original detection head and improves the detection performance of the detection head in complex backgrounds. An upsampling and Concat operation is added to the neck part of the original model, and a multi-scale feature detection including minimal, small and medium targets is performed to improve the feature extraction ability of the model for medium and small targets. DenseNet is introduced and integrated with the C3 module of YOLOv5s backbone network to propose the C3_DenseNet module to enhance feature transfer and prevent model overfitting. The DY-YOLOv5 algorithm is applied to the VisDrone 2019 dataset, and the mean average precision (mAP) reaches 43.9%, which is 11.4 percentage points higher than the original algorithm. The recall rate (Recall) is 41.7%, which is 9.0 percentage points higher than the original algorithm. Experimental results show that the improved algorithm significantly improves the accuracy of target detection in UAV aerial images.

Reference | Related Articles | Metrics

Select

Hyperspectral Image Classification Based on Double Branch Multidimensional Attention Feature Fusion

MA Yamei, WANG Shuangting, DU Weibing

Computer Engineering and Applications 2024, 60 (7): 192-203. DOI: 10.3778/j.issn.1002-8331.2211-0139

Abstract （38）

PDF（pc）（729KB）（36）

Save

To improve the classification performance of small sample classes of hyperspectral images and to enhance the robustness of the model feature representation, a neural network classification model with two-branch multidimensional attentional feature fusion (DBMD) is proposed. DBMD uses two branches for spectral feature extraction and hybrid feature extraction respectively. The spectral branch extracts features step-by-step through densely connected dilated convolution, and then fuses low, medium and high level semantic information as the feature output. The hybrid branch uses a 3D-2D network architecture and extracts spatial scale features through improved Inception blocks. In addition, the attention mechanism is applied to spectral, spatial and spatial-spectral feature extraction respectively for feature refinement and to enhance the feature response in important regions. Finally, the refined features of different dimensions are jointly input to the classifier for classification. Experiments using 5% and 1% samples on the Indian Pines and Salinas Valley datasets achieve an overall accuracy of 98.40% and 99.78% respectively, and the proposed model performs better in terms of accuracy and stability compared to the other six network architectures.

Reference | Related Articles | Metrics

Select

VR Interactive 3D Virtual Crane Modeling and Simulation

HUANG Kaige, HUI Yanbo, LIU Yonggang, WANG Hongxiao, WANG Qiao

Computer Engineering and Applications 2024, 60 (7): 204-211. DOI: 10.3778/j.issn.1002-8331.2211-0173

Abstract （21）

PDF（pc）（884KB）（18）

Save

Crane, as a widely used special equipment, is highly dangerous in operation and prone to safety accidents. In order to reduce the safety accidents caused by improper operation, the state attaches great importance to the crane safety training. The current training operations are mostly traditional demonstration training with high training costs and poor results. Virtual reality technology has the advantages of immersion, interaction and multi-perception and so on. Based on this, this study uses virtual reality technology to establish crane training and assessment system, which have greatly improved the effect of worker training. In order to restore the true use scene of the crane, first of all, reverse engineering technology is used to model the crane and crane workshop. Secondly, in view of the reality of 3D virtual scene and the poor reusability of interactive models with a large number of models, the level of detail (LOD) model is used to build the geometric model of crane, so as to optimize the realism and real-time of the system. Then, 3D scene roaming, collision detection and fast navigation of crane are realized with Unity platform. Based on MySQL database, the training data and crane important parameters data are storage in real-time. Finally, the virtual platform is validated according to the existing bridge platform. The results show that the crane virtual reality training system can greatly improve the sensory training while reducing the training cost, and the experiment has a better effect.

Reference | Related Articles | Metrics

Select

Generative Adversarial Network with Dual Discriminator and Mixed Attention

WANG Lei, YANG Jun, ZHANG Chiyu, DAI Zaiyan

Computer Engineering and Applications 2024, 60 (7): 212-221. DOI: 10.3778/j.issn.1002-8331.2211-0196

Abstract （32）

PDF（pc）（853KB）（46）

Save

In image generation tasks, how to improve the quality of generated images is a key problem. Currently, the multi-layer convolutional structure adopted by GAN has the problem of local induction bias, which cannot focus on key information, resulting in losing image features during training process. In this paper, a model of generative adversarial network with dual discriminator and mixed attention, termed as DDMA-GAN, is proposed. Firstly, DDMA-GAN designs a mixed attention mechanism, which utilizes channel attention and spatial attention to fully capture image feature information. Secondly, to solve the problem of discrimination error of single discriminator, a dual discriminator structure is proposed. The fusion coefficient is used to fuse the judgment results to make the returned parameters more objective, and the data augmentation module is embedded to further improve the robustness of the model. Finally, the hinge loss is used as loss function to maximize the distance between true and fake samples. The model is verified on public datasets LSUN and CelebA. Experimental results show that images generated by DDMA-GAN on classical datasets are more realistic. FID and MMD of DDMA-GAN are significantly reduced, which fully indicate validity of model.

Reference | Related Articles | Metrics

Select

Image Feature Classification Based on Multi-Agent Deep Reinforcement

ZHANG Zewei, ZHANG Jianxun, ZOU Hang, LI Lin, NAN Hai

Computer Engineering and Applications 2024, 60 (7): 222-228. DOI: 10.3778/j.issn.1002-8331.2211-0129

Abstract （27）

PDF（pc）（659KB）（31）

Save

In order to solve the problem of high complexity of input image data in machine learning tasks such as image feature recognition and classification, a multi-agent deep reinforcement learning method for image feature classification is proposed. Firstly, the image feature classification task is transformed into a partially observable Markov decision process. It uses multiple moving isomorphic agents to collect part of the image information, and studies how agents form local understanding of the image and take actions, and how to extract and classify relevant features from locally observed images, so as to reduce the data complexity and filter out irrelevant data. Secondly, the improved value function decomposition method is used to train the agent strategy network, and the global return of the environment is divided according to the contribution of each agent, so as to solve the reliability allocation problem of the agent. The proposed method is verified on MNIST handwritten numerals data set and NWPU-RESISC45 remote sensing image data set. Compared with the baseline algorithm, it can learn more effective association strategies, and the classification process has better stability and improved accuracy.

Reference | Related Articles | Metrics

Select

Camouflage Object Detection Algorithm Based on Edge Attention and Reverse Orientation

HE Wenhao, GE Haibo, CHENG Mengyang, AN Yu, MA Sai

Computer Engineering and Applications 2024, 60 (7): 229-237. DOI: 10.3778/j.issn.1002-8331.2211-0211

Abstract （28）

PDF（pc）（842KB）（35）

Save

Camouflage object detection (COD) has important application value in many fields. The existing COD algorithm mainly focuses on the expression of the features extracted from the backbone network and the problem of feature fusion, ignoring the problems of focusing on the edge features of the object and inferring the real area of the object. Aiming at the above problems, a camouflaged object detection algorithm based on edge attention and reverse positioning is proposed. The algorithm consists of edge attention module (EAM), close integration module (CIM) and reverse positioning module (RPM). First, the EAM module is used in the feature encoding stage to enhance the expression of multi-level features extracted from the Res2Net-50 backbone network and highlight edge features. Then, the CIM module is used for the fusion of multi-level features to reduce the loss of feature information. Finally, the RPM module is used to process the rough prediction maps from different feature pyramids, reverse localize the real region of the object, and infer the real object. Experiments on 3 public datasets show that the proposed algorithm outperforms the other 8 state-of-the-art models. On the COD10K dataset, the mean absolute error (MAE) reaches 0.038.

Reference | Related Articles | Metrics

Select

Learning Gaussian-Aware Constraint Spatial Anomaly for Correlated Filter Target Tracking

JIANG Wentao, WANG Zimin, ZHANG Shengchong

Computer Engineering and Applications 2024, 60 (7): 238-247. DOI: 10.3778/j.issn.1002-8331.2211-0408

Abstract （19）

PDF（pc）（903KB）（18）

Save

In an effort to solve the loss of target tracking in complicated movements, a target tracking algorithm with Gaussian-aware constraint space anomaly is proposed. Firstly, the feature sampling points of the target are established with Gaussian uniform distribution as the distribution law, and the appearance model and weight model of the target are extracted with convolution structure. Secondly, in an effort to constrain spatial anomaly, spatial regular terms are constructed in the target function; while at the same time the target weight model is updated to minimize the occurrence of spatial overfitting, thereby enhancing the spatial anomaly adaptability of the tracker. Lastly, the weighted least square method is applied to obtain the weight response model center, so as to determine the target center, update the tracking position, thereby enhancing the robustness of the tracker. By means of OTB2015 and UAV20L dataset, the algorithm proposed in this paper, when compared with other mainstream relevant filtering algorithms, presents high tracking success rate and tracking accuracy under such complicated circumstances as low resolution and obstruction due to target motion.

Reference | Related Articles | Metrics

Select

Hand Pose Estimation Based on Multi-Feature Enhancement

FENG Xinxin, GAO Shu

Computer Engineering and Applications 2024, 60 (6): 207-213. DOI: 10.3778/j.issn.1002-8331.2210-0089

Abstract （39）

PDF（pc）（580KB）（35）

Save

Hand pose estimation is one of the important research directions of computer vision, which plays an important role in human-computer interaction, virtual reality, robot control and other application fields. At present, hand pose estimation has the problem of single feature representation method. This paper proposes a feature construction method of hand key point connection relationship and a key point feature aggregation enhancement method based on hand motion semantic relationship to improve the hand feature representation and information sharing ability. Aiming at the occlusion problem in hand target detection and image segmentation, a hand contour feature extraction method is designed to improve the preprocessing effect. Based on the proposed multi-feature representation and enhancement method, a depth learning neural network model based on full convolution structure is constructed to avoid the problem of spatial information loss caused by direct regression calculation of 3D pose information, thus effectively improving the accuracy of 3D hand pose estimation. Compared with the SOTA model on DO, ED, RHD datasets, it has achieved a competitive effect, and the average AUC result has reached 93.3%, indicating that the proposed method also has good universality.

Reference | Related Articles | Metrics

Select

Commonsense Oriented Fine-Grained Data Augmentation

LI Huachao, KANG Bin, WANG Lei

Computer Engineering and Applications 2024, 60 (6): 214-221. DOI: 10.3778/j.issn.1002-8331.2210-0361

Abstract （31）

PDF（pc）（618KB）（26）

Save

The representative researches on data augmentation are mainly carried out on common classification benchmark datasets such as ImageNet. Considering intra-class and inter-class relation in fine-grained visual classification(FGVC) datasets is so different from ordinary classification datasets, data augmentation methods for FGVC need to be further studied. Therefore, this paper proposes a fine-grained semantic image patch mixing method by commonsense(ComSipmix), starting from the fine-grained recognition task and the special properties of the dataset. The proposed method exploits common sense knowledge to mine potential associations between sample labels, and designs a multi-branch convolutional neural network structure for structured image mixing strategy based on this, so that the image mixing process pays more attention to the subtle differences of targets. Through extensive performance tests, it can be verified that the performance of the proposed method is significantly better than the mainstream image mixing-based data augmentation methods. At the same time, through experimental verification, the common sense knowledge proposed in this paper helps to improve the performance of various data augmentation models based on mixed image classes.

Reference | Related Articles | Metrics

Select

CME-Based Few-Shot Detection Model with Enhanced Multiscale Deep Features

DING Zhengwei, BAI Hexiang, HU Shen

Computer Engineering and Applications 2024, 60 (6): 222-229. DOI: 10.3778/j.issn.1002-8331.2211-0419

Abstract （33）

PDF（pc）（614KB）（34）

Save

A CME-based few-shot detection model with enhanced multiscale deep feature is proposed to address the problems that existing few-shot detection models have insufficient consideration of global semantic information of images and degradation of detector performance due to varying input image sizes. Firstly, the model is trained with a large amount of labeled base class data and a multilayer convolutional neural network based on residual jumping and a multiscale feature enhanced module with good generalization, then the model is fine-tuned with a small amount of labeled new class data and base class data, and finally the fine-tuned model is used for target detection. To verify the effectiveness of the model, the VOC2007 and VOC2012 datasets are used to train and evaluate the model, and the relevant ablation experiments demonstrate that the introduction of a multilayer convolutional neural network with residual jump structure and the multi-scale feature enhanced module can further increase the accuracy of the model both alone and in combination. In comparison experiments with six representative small-sample target detection models, it is shown that the CME with multiscale deep feature deepening scores outperforms the state-of-the-art detector by an average of 4.75 percentage points.

Reference | Related Articles | Metrics

Select

Small Target-Oriented Multi-Space Hierarchical Helmet Detection

LI Jiaxin, HU Yang, HUANG Xiezhou, LI Hongjun

Computer Engineering and Applications 2024, 60 (6): 230-237. DOI: 10.3778/j.issn.1002-8331.2210-0353

Abstract （30）

PDF（pc）（792KB）（29）

Save

As there are factors affecting the detection effect such as small targets and distances in the target video, it is difficult to capture small targets. A multi-spatial hierarchical helmet wearing detection algorithm for small targets is proposed in the article, which will be personalized and improved on the basis of Yolov5s network model. Firstly, a multi-spatial attention module is designed to consider the effects of spatial features from different perspectives and fuse them to enhance the spatial location relationships of salient features. Secondly, features at multiple spatial scales are fused while combining multiple features in the feature extraction process to adapt to the capture of targets at different spatial levels and improve the detection of small targets. Thirdly, data augmentation is used to improve the generalizability of the dataset to adapt the training targets to more diverse scenarios. Finally, the loss function is optimized to enhance the regression capability and improve the training effect. The experimental results show that the proposed algorithm achieves an average accuracy of 91.5%, significantly reducing the number of missed detections. In addition, the proposed algorithm has been deployed to real construction sites and has shown superior performance in detecting small targets, which is of great value for application.

Reference | Related Articles | Metrics

Select

Expression Recognition Combining 3D Interactive Attention and Semantic Aggregation

WANG Guangyu, LUO Xiaoshu, XU Zhaoxing, FENG Fangyu, XU Jiangjie

Computer Engineering and Applications 2024, 60 (6): 238-248. DOI: 10.3778/j.issn.1002-8331.2210-0398

Abstract （30）

PDF（pc）（701KB）（37）

Save

A facial expression recognition method combining 3D augmented attention and semantic aggregation is proposed to address the problems that traditional convolutional networks are difficult to effectively integrate features of facial expressions of faces at different stages, have feature expression bottlenecks and cannot efficiently utilize contextual semantics. Firstly, it is optimized on the basis of rank expansion (ReXNet) network to fuse contextual features while eliminating expression bottlenecks to make it more suitable for expression recognition tasks. Secondly, to capture discriminative face expression fine-grained features, 3D augmented attention is constructed by combining non-local blocks with cross-dimensional information interaction theory. Finally, in order to fully utilize the shallow and mid-level underlying features and high-level semantic features of expressions, a semantic aggregation module is designed to aggregate multi-level global contextual features with high-level semantic information to achieve mutual semantic gain of expressions of the same class and enhance intra-class consistency. Experiments show that the accuracy of the method is 88.89%, 89.53% and 62.22% on the publicly available datasets RAF-DB, FERPlus and AffectNet-8, respectively, demonstrating the advancedness of the method.

Reference | Related Articles | Metrics

Select

Semi-Supervised Object Detection Algorithm Based on Localization Confidence Weighting

FENG Zeheng, WANG Feng

Computer Engineering and Applications 2024, 60 (6): 249-258. DOI: 10.3778/j.issn.1002-8331.2210-0400

Abstract （32）

PDF（pc）（696KB）（22）

Save

Reference | Related Articles | Metrics

Select

Wavelet Frequency Division Self-Attention Transformer Image Deraining Network

FANG Siyan, LIU Bin

Computer Engineering and Applications 2024, 60 (6): 259-273. DOI: 10.3778/j.issn.1002-8331.2211-0099

Abstract （71）

PDF（pc）（1362KB）（65）

Save

In view of the weak ability of vision Transformer (ViT) to capture high-frequency information and the problem that many image deraining methods are prone to lose details, a wavelet frequency division self-attention Transformer image deraining network (WFDST-Net) is proposed. As the main module of WFDST-Net, the wavelet frequency division self-attention Transformer (WFDST) uses non-separable lifting wavelet transform to obtain the low-frequency and high-frequency components of feature map, and carries out self-attention interaction in the low frequency and high frequency respectively, so that the module can learn from the low frequency to restore the overall structure, and strengthen the ability to capture line details such as rain streaks in the high frequency, thus enhancing the modeling ability of different frequency domain features. WFDST-Net adopts U-shaped architecture and obtains multi-scale features through non-separable lifting wavelet transform, which can capture high-frequency rain streaks of different shapes while ensuring the integrity of information. WFDST-Net has lower parameters than other Transformers related to image deraining. In addition, the VOCRain250 dataset is proposed for the task of joint image deraining and semantic segmentation, which has advantages over the currently widely used BDD150. The experimental results show that the proposed method enhances the ability of ViT to capture different frequency domain information, and outperforms the current state-of-the-art deraining methods in the performance of synthetic and real-world datasets and joint semantic segmentation tasks. It can effectively remove complex rain streaks while retaining more background details.

Reference | Related Articles | Metrics

Select

Lightweight Object Detection Method for Constrained Environments

QU Haicheng, YUAN Xudong, LI Jiaqi

Computer Engineering and Applications 2024, 60 (6): 274-281. DOI: 10.3778/j.issn.1002-8331.2211-0283

Abstract （26）

PDF（pc）（604KB）（20）

Save

The lightweight design of object detection models plays an important role in environments with limited computing resources and storage space. To further compress the size of the object detection model and improve its detection accuracy, a higher performance lightweight object detection model named Lite-YOLOX is proposed, which improves the structure of the feature pyramid, the structure of the decoupling head, and the loss function based on the YOLOX-Tiny model. Firstly, to further compress the size of the original model, the structure of the feature pyramid and decoupled head are redesigned to make the neck and head parts of the model lighter. Then, to improve the detection accuracy of the model, the EIoU loss function which is more sensitive to the position of the ground truth box is designed to optimize the proposed model. Finally, the validation experiments are performed on the Pascal VOC and safety helmet wearing dataset. The experimental results show that compared with YOLOX-Tiny, Lite-YOLOX reduces the parameters by 40%, the floating point of operations by 37.5%, and the mAP50 increases by 3.2 and 3.1 percentage points. On the NVIDIA Jetson Xavier NX, the frames per second (FPS) is increased from 51 to 59, and the real-time performance is significantly improved.

Reference | Related Articles | Metrics

Select

Multi-Object Tracking with Spatial-Temporal Embedding Perception and Multi-Task Synergistic Optimization

LIANG Xiaoguo, LI Hui, CHENG Yuanzhi, CHEN Shuangmin, LIU Hengyuan

Computer Engineering and Applications 2024, 60 (6): 282-292. DOI: 10.3778/j.issn.1002-8331.2211-0385

Abstract （30）

PDF（pc）（1097KB）（39）

Save

To solve the tracking challenges caused by frequent occlusion, crowded scenes and variable object scales in multi-object tracking, a multi-object tracking method is proposed via spatial-temporal embedding perception and multi-task synergistic optimization. Firstly, spatial correlation module is proposed to extract discriminative embedding with object context awareness in spatial. Secondly, temporal correlation module is proposed to aggregate the embedding extracted from spatial correlation module, and aggregated embedding is used to generate temporal attention to guide spatial correlation module to extract more discriminative embedding in frequent occlusion and crowded scenes. Therefore, discriminative embedding enhances association robustness while predicting more accurate detection box to overcome the scale variability issues, and accurate detection box facilitates the extraction of higher quality embedding for the proposed modules. In this way, the synergistic optimization among multiple tasks of embedding extraction, position prediction and data association is achieved. Finally, GIoU distance among detection boxes is introduced into the affinity matrix to further improve association robustness in occlusion and crowded scenes. Experimental results on MOT16, MOT17 and MOT20 datasets show that the proposed method exhibits superior tracking performance to state-of-the-art methods.

Reference | Related Articles | Metrics

Select

Research on Pedestrian Multi-Object Tracking Algorithm Under OMC Framework

HE Yuting, CHE Jin, WU Jinman, MA Pengsen

Computer Engineering and Applications 2024, 60 (5): 172-182. DOI: 10.3778/j.issn.1002-8331.2211-0344

Abstract （44）

PDF（pc）（859KB）（32）

Save

Multi-object tracking is an important direction that has been widely studied in the field of computer vision, but in practical applications, the rapid movement of targets, lighting changes, and occlusions can lead to poor tracking performance, therefore, the multi-object tracking model OMC is used as the basic framework to carry out research to achieve further improvement of tracking performance. Firstly, to address the problem of uneven quality of target features in multi-object tracking, the feature extractor is optimized by integrating the GAM attention mechanism in the backbone network and replacing the upsampling method in the Neck network part. Secondly, to address the “competition problem” between detection and re-identification tasks in existing methods, a recursive cross-correlation network is constructed so that the model can learn the characteristics and commonalities of different tasks. Here, two sub-tasks are optimized separately, on the one hand, a new channel attention HS-CAM is designed to optimize the re-identification network;on the other hand, the boundary regression loss function of the detection part is replaced and the EIoU loss function is adopted. Experiments show that MOTA metrics can reach 73.5%, IDF1 can reach 70.4%, and MLgt is 11.7% on MOT16 dataset, which is 1.5 percentage points reduction compared to OMC algorithm.

Reference | Related Articles | Metrics

Select

UAV Small Object Detection Algorithm Based on Context Information and Feature Refinement

PENG Yanfei, ZHAO Tao, CHEN Yankang, YUAN Xiaolong

Computer Engineering and Applications 2024, 60 (5): 183-190. DOI: 10.3778/j.issn.1002-8331.2305-0401

Abstract （79）

PDF（pc）（661KB）（93）

Save

Object detection in UAV aerial images is a research hotspot in recent years, aiming at the problem of low detection accuracy caused by small and dense objects and complex background from the perspective of UAV, a UAV small object detection algorithm based on context information and feature refinement is proposed. Firstly, through the context feature enhancement module, the multi-scale dilated convolution is used to capture the potential relationship with the pixels in the surrounding area, which complements the context information of the network. According to the feature layers of different scales, the output weights of each level of feature maps are adaptively generated, and the expression ability of the feature map is dynamically optimized. Secondly, due to different fineness of different feature maps, the feature refinement module is used to suppress the interference of conflict information in feature fusion to prevent the small object features from drowning in conflict information. Finally, a weighted loss function is designed to accelerate the convergence speed of the model and further improve the accuracy of small object detection. Extensive experiments on the VisDrone2021 dataset show that the improved model improves 8.4 percentage points over the benchmark model mAP50, 5.9 percentage points over mAP50：95, and the FPS is 42, which effectively improves the detection accuracy of small objects in UAV aerial images.

Reference | Related Articles | Metrics

Select

Re-Parameterized YOLOv8 Pavement Disease Detection Algorithm

WANG Haiqun, WANG Bingnan, GE Chao

Computer Engineering and Applications 2024, 60 (5): 191-199. DOI: 10.3778/j.issn.1002-8331.2309-0354

Abstract （77）

PDF（pc）（592KB）（102）

Save

Road disease detection is an important way to ensure people’s traffic safety. In order to improve the accuracy of road disease detection and achieve timely and accurate road disease detection, a pavement disease detection model of re-parameterized YOLOv8 is proposed. First of all, CNX2f module is introduced into the backbone network to improve the ability of the network to extract pavement disease features, and effectively solve the problem that the pavement disease features are easily confused with the background environmental features. Secondly, RepConv and DBB reparameterization modules are introduced to enhance the capability of multi-scale feature fusion and solve the problem of large scale difference of pavement diseases. At the same time, the shared parameter structure of the head is improved, and RBB reparameterization module is introduced to solve the problem of head parameter redundancy and improve the feature extraction capability. Finally, the SPPF_Avg module is introduced to solve the problem of pavement feature loss and enrich the multi-scale feature expression. The experimental results show that the accuracy of the improved road disease detection network is 73.3%, the recall rate is 62.3% and the mAP is 69.3%, which is 2.6, 3.0 and 2.8 percentage points higher than that of the YOLOv8 network, and the detection effect of the model is improved.

Reference | Related Articles | Metrics

Select

Traffic Sign Detection Algorithm Based on Improved YOLOv5-S

LIU Haibin, ZHANG Youbing, ZHOU Kui, ZHANG Yufeng, LYU Sheng

Computer Engineering and Applications 2024, 60 (5): 200-209. DOI: 10.3778/j.issn.1002-8331.2306-0293

Abstract （92）

PDF（pc）（689KB）（112）

Save

In the field of autonomous driving, existing traffic sign detection methods have problems with missed or incorrect sign detection in complex backgrounds, reducing the reliability of intelligent vehicles. To address this issue, a real-time traffic sign detection algorithm is proposed to enhance YOLOv5-S. Firstly, the coordinate attention mechanism is integrated into the feature extraction network to perceive the location of the object by establishing long-term dependencies on the target, making the algorithm focus on high-priority regions. Secondly, the Focal-EIoU loss function is used to replace the CIoU, allowing the network to focus more on high-quality classification samples, improving the network’s ability to learn from difficult samples and reducing the occurrence of missed or false detections. Next, the lightweight convolution technique GSConv is integrated into the network to reduce the complexity of the model. Finally, a new small target detection layer is added to improve the algorithm’s detection of small-sized signs by using richer feature information. The experimental results show that the improves algorithm achieves 88.1% for mAP@0.5 and 68.5% for mAP@0.5：0.95, with a detection speed of 83 FPS, which can meet the requirements of real-time and reliable detection.

Reference | Related Articles | Metrics

Select

Multi-Coupled Feedback Networks for Image Fusion and Super-Resolution Methods

WANG Rong, DUANMU Chunjiang

Computer Engineering and Applications 2024, 60 (5): 210-220. DOI: 10.3778/j.issn.1002-8331.2212-0118

Abstract （34）

PDF（pc）（697KB）（24）

Save

People often need to obtain high dynamic range and high resolution images in their daily life. However, due to the limitation of technical equipment, high dynamic range images are often obtained by multi-exposure fusion (MEF) of low dynamic range images, and high resolution images are often obtained by super resolution (SR) of low resolution images. MEF and SR are usually studied as two separate elements. In order to solve the problem that the current model cannot achieve high dynamic range and high resolution at the same time, a multi-coupling feedback network (MCF-Net) and its method are proposed in this paper through the study of existing methods. The model includes：[N] subnets and output modules; in the method, first, [N] downsampled images [Iilr,Imlr,I-ilr] are input to [N] subnets respectively, and the extracted low-resolution features [Filr,Fmlr,F-ilr]; then the super-resolution features [Gi0,Gm0,G-i0] of the corresponding images are extracted according to the low-resolution features; the fused high-resolution features [Git,Gmt,G-it] are obtained and input to the next MCFB until the T-th MCFB obtains the fused high-resolution features [GiT,GmT,G-iT]; then the corresponding fused super-resolution image [Iit,Imt,I-it] is obtained; finally the high dynamic range, super-resolution image [Iout] is obtained by fusing the output [IiT,ImT,I-iT] of the [T]-th reconstruction module REC in [N] subnets. In this paper, the performance is experimented and verified on the SICE dataset, and compared with 33 existing methods, the results show that each of the following evaluation indexes has been significantly improved, including the structural similarity (SSIM) reaching 0.833 2, the peak signal-to-noise ratio (PSNR) reaching 22.07 dB, and the multi-exposure fusion similarity (MEF-SSIM) reaching 0.937 8.

Reference | Related Articles | Metrics

Select

Image Super-Resolution Reconstruction Algorithm with Adaptive Aggregation of Hierarchical Information

CHEN Weijie, HUANG Guoheng, MO Fei, LIN Junyu

Computer Engineering and Applications 2024, 60 (5): 221-231. DOI: 10.3778/j.issn.1002-8331.2210-0155

Abstract （34）

PDF（pc）（688KB）（44）

Save

With the development of convolutional neural networks, image super-resolution reconstruction algorithms have made some breakthroughs. Nevertheless, the existing image super-resolution algorithms rarely distinguish the use of hierarchical features and suffer from the problem of costly multi-scale feature extraction. To address these problems, this paper proposes an image super-resolution reconstruction algorithm with adaptive aggregation of hierarchical information. Specifically, the algorithm applies a multi-level information refinement mechanism for the adaptive enhancement of features at different levels to solve the problem that the hierarchical features are not distinguishably utilized. In addition, it is proposed to construct a fine-grained multi-scale information aggregation block to solve the problem of costly multi-scale information extraction and poor feature representation capability. Finally, the algorithm focuses on contrast-enhanced recombinant attention blocks to achieve adaptive calibration of features at a lower cost by exploiting channel and spatial information. Extensive experiments show that compared with other advanced algorithms, the proposed method can achieve better metrics and visual results on five benchmark datasets such as Urban100.

Reference | Related Articles | Metrics

Select

Improved UNet++ for Tree Rings Segmentation of Chinese Fir CT Images

LIU Shuai, GE Zhedong, LIU Xiaotong, GAO Yisheng, LI Yang, LI Mengfei

Computer Engineering and Applications 2024, 60 (5): 232-239. DOI: 10.3778/j.issn.1002-8331.2210-0212

Abstract （51）

PDF（pc）（894KB）（63）

Save

In order to solve the problem that it is difficult to accurately segment tree rings with defects such as cracks, wormholes and knots. The medical CT is used as experimental equipment to reconstruct 125 CT images of Chinese fir transverse sections, and these images are used as the data set. Data set is expanded by pre-processing such as cutting, rotating and flipping CT images. An improved UNet++ model is proposed for tree rings segmentation. Convolutional blocks, downsampling layers, skip connections and upsampling layers have been added to the improved UNet++ model, and the learning depth is increased to 6 layers. The BCEWithLogitsLoss, ReLU and RMSProp are used as loss function, activation function and optimization function respectively. The improved UNet++ model is used to segment the tree rings of the transverse sections of Chinese fir reconstructed by CT, and the performance of the model is evaluated. The results show that the pixel accuracy of the improved UNet++ model is 97.81%, the dice coefficient is 98.89%, the intersection over union is 95.29%, and the mean intersection over union is 84.75%. The best segmentation result is obtained by fully extracting the characteristics in Chinese fir tree rings. Compared with the U-Net model and the UNet++ model, the improved UNet++ model makes the segmented tree rings complete and continuous, although most tree rings are cut by cracks and wormholes and cannot form a complete circular closed curve, fracture and noise are eliminated. The results show that the improved UNet++ model is not affected by defects such as cracks, knots and wormholes, and the segmentation results are very clear, which effectively solves mis-segmentation and under segmentation of dense tree rings under the interference of wormhole defects.

Reference | Related Articles | Metrics

Select

Facial Expression Generation Based on Group Residual Block Generative Adversarial Netxwork

LIN Benwang, ZHAO Guangzhe, WANG Xueping, LI Hao

Computer Engineering and Applications 2024, 60 (5): 240-249. DOI: 10.3778/j.issn.1002-8331.2210-0234

Abstract （31）

PDF（pc）（983KB）（31）

Save

Facial expression generation is the generation of facial images with expressions through a certain expression calculation method, which is widely used in face editing, film and television production, and data augmentation. With the advent of generative adversarial network (GAN), facial expression generation has made significant progress, but problems such as overlapping, blurring, and lack of realism still occur in facial expression generation images. In order to address the above issues, group residuals with attention mechanism generative adversarial network (GRA-GAN) is proposed to generate high-quality facial expressions. Firstly, an adaptive mixed attention mechanism (MAT) is embedded in the generative network before downsampling and after upsampling to adaptively learn the key region features and enhance the learning of key regions of the image. Secondly, the idea of grouping is integrated into the residual network, and the group residuals block with attention mechanism (GRA) module is proposed to achieve better generation effect. Finally, the experimental verification is carried out on the public dataset RaFD. The experimental results show that the proposed GRA-GAN outperforms the related methods in both qualitative and quantitative analysis.

Reference | Related Articles | Metrics

Select

Improving YOLOX-s Dense Garbage Detection Method

XIE Ruobing, LI Maojun, LI Yiwei, HU Jianwen

Computer Engineering and Applications 2024, 60 (5): 250-258. DOI: 10.3778/j.issn.1002-8331.2210-0235

Abstract （49）

PDF（pc）（837KB）（47）

Save

To address the problems of low recognition rate, inaccurate localization and false detection and omission of targets to be detected in densely stacked multi-species garbage detection, a garbage detection method in corporating multi-headed self-attention mechanism to improve YOLOX-s is proposed. Firstly, the Swin Transformer module is embedded in the feature extraction network, and the multi-headed self-attention mechanism based on the sliding window operation is introduced to make the network take into account the global feature information and the key feature information to reduce the false detection phenomenon. Secondly, the deformable convolution is used in the prediction output network to refine the initial prediction frame and improve the localization accuracy. Finally, on the basis of the EIoU, loss weighting coefficients are introduced to propose a weighted IoU-EIoU loss, which adaptively adjusts the degree of concern for different losses at different stages of training to further accelerate the convergence of the training network. Testing on a public 204-class spam detection dataset, the results show that the average mean accuracy of the propose improve algorithm can reach 80.5% and 92.5%, respectively, which is better than the current popular target detection algorithms, and the detection speed is fast to meet the real-time requirements.

Reference | Related Articles | Metrics

Select

Ship Target Detection Method Combining Visual Saliency and EfficientNetV2

LIANG Xiuya, FENG Shuichun, CHEN Hongzhen

Computer Engineering and Applications 2024, 60 (5): 259-270. DOI: 10.3778/j.issn.1002-8331.2210-0267

Abstract （50）

PDF（pc）（1320KB）（42）

Save

With the increasing resolution of optical remote sensing images, fast and accurate detection of ship targets on the sea has become one of the basic challenges of maritime research. In order to solve the problems faced in the detection process, such as large image size but sparse targets, complex background interference, poor timeliness of target extraction, and large calculation of model volume, a practical ship detection scheme is proposed. Visual saliency is introduced to effectively accelerate the pre-screening process, and the difference between the ship target area and the background is effectively expressed by wavelet decomposition coefficients, which can enhance the target directional characteristics while suppressing noise. Saliency map is generated through the improved model based on phase spectrum of quaternion Fourier transform (PQFT). In addition, Gini index is exploited to guide multi-scale saliency image fusion to enhance image scale adaptability and small target saliency. Comparing with other saliency methods, the proposes model can effectively suppress the interference of complex environments such as cloud, fog, sea clutter, and ship wake. More importantly, it produces a smaller set of candidate regions than the classical sliding window or other region recommendation methods. After the saliency map is obtained, the adaptive threshold OTSU method is employed for binary segmentation of saliency map. In the target discrimination stage, the lightweight network EfficientNetV2 is exploited to effectively eliminate false alarms. The experimental results show that the proposes ship detection method has high robustness and accuracy up to 96%, meeting the real-time requirements.

Reference | Related Articles | Metrics

Select

Dual-Branch Low-Light Image Enhancement Combined with Dense Wavelet Transform

CHEN Junjie, ZHOU Yongxia, ZU Jiazhen, SHEN Wei, ZHAO Ping

Computer Engineering and Applications 2024, 60 (4): 200-210. DOI: 10.3778/j.issn.1002-8331.2209-0470

Abstract （42）

PDF（pc）（3662KB）（42）

Save

A dual-branch image enhancement method combining dense wavelet transform is proposed to solve the problems of low brightness, high noise, and color distortion in low-light images. Firstly, dense wavelet networks are used for multi-scale feature information fusion to reduce information loss and provide denoising capability. Then, the global attention module and feature extraction module are embedded in the multi-scale feature fusion to fully extract global and local features. Finally, the effect of low-light images is enhanced by color enhancement and detail reconstruction with a dual-branch structure. In addition, a new joint loss function is introduced to guide the network training from multiple aspects to enhance its performance. The experimental results show that the proposed method effectively improves the brightness of low-light images, suppresses image noise, and obtains richer details and color information. The enhanced images are clearer and more natural, and the peak signal-to-noise ratio and structural similarity have significant advantages over the mainstream methods.

Reference | Related Articles | Metrics

Content of Graphics and Image Processing in our journal