Computer Engineering and Applications

Select

Lightweight Traffic Monitoring Object Detection Algorithm Based on Improved YOLOX

HU Weichao, GUO Yuyang, ZHANG Qi, CHEN Yanyan

Computer Engineering and Applications 2024, 60 (7): 167-174. DOI: 10.3778/j.issn.1002-8331.2308-0081

Abstract （77）

PDF（pc）（630KB）（95）

Save

Traffic target detection technology is an important tool for traffic management departments in key tasks such as traffic monitoring and safety surveillance. Faced with the large amount of traffic monitoring scene data, there is a need to employ traffic target detection techniques that offer fast detection speed, high accuracy and low computational resource utilization. To meet this need, this paper proposes a lightweight traffic target detection algorithm PL-YOLO for traffic monitoring scenes based on the YOLOX algorithm and the PP-LCNet network. Furthermore, considering the dense distribution and small size of vehicles in traffic monitoring scenes, the SimAM attention mechanism module is added to focus on more meaningful features. Experimental results demonstrate that PL-YOLO achieves 1.89 percentage points increase in detection accuracy, the model size decreases by 54% and the FPS increases from 20.88 frame/s to 26.68 frame/s compared to the YOLOX-s model.

Reference | Related Articles | Metrics

Select

Improved YOLOv7 Algorithm for Wood Surface Defect Detection

JIANG Xingwang, ZHAO Xingqiang

Computer Engineering and Applications 2024, 60 (7): 175-182. DOI: 10.3778/j.issn.1002-8331.2309-0185

Abstract （64）

PDF（pc）（599KB）（81）

Save

High quality wood is deeply loved by people, but it has various defects that lead to low yield and low utilization rate of high-quality wood. The use of deep learning object detection algorithms can achieve rapid and stable detection of wood surface defects, thereby improving the quality and utilization of wood. A wood surface defect detection model YOLOv7-ESS based on improved YOLOv7 is proposed to address the problem of poor detection accuracy caused by the small, dense, and complex target size of wood surface defects. Firstly, in response to the issue of extreme aspect ratio affecting the detection effect of wood crack defects, an attention module ECBAM is embedded to enhance the model’s feature extraction ability by enhancing attention to extreme aspect ratio defects. Secondly, in response to the problem of severe loss of feature information for small defects on the wood surface during feature extraction, a shallow weighted feature fusion network SFPN is introduced, which uses deep feature maps as output and effectively utilizes shallow feature information to improve the recognition accuracy of small defects. Finally, the SIoU loss function is introduced to improve the convergence speed and accuracy of the model. The results show that the average detection accuracy of the YOLOv7-ESS model is 94.7%, which is 11.2 percentage points higher than YOLOv7 and meets the defect detection requirements for wood production and processing.

Reference | Related Articles | Metrics

Select

DY-YOLOv5：Target Detection for Aerial Image Based on Multiple Attention

ZHAO Xin, CHEN Lili, YANG Weichuan, ZHANG Chengwang

Computer Engineering and Applications 2024, 60 (7): 183-191. DOI: 10.3778/j.issn.1002-8331.2309-0419

Abstract （82）

PDF（pc）（1074KB）（85）

Save

Aiming at the problem of low detection accuracy caused by small targets, different scales and complex backgrounds in UAV aerial images, a target detection algorithm for UAV aerial images based on improved YOLOv5 is proposed. The algorithm introduces a target detection head method Dynamic Head with multiple attention mechanisms to replace the original detection head and improves the detection performance of the detection head in complex backgrounds. An upsampling and Concat operation is added to the neck part of the original model, and a multi-scale feature detection including minimal, small and medium targets is performed to improve the feature extraction ability of the model for medium and small targets. DenseNet is introduced and integrated with the C3 module of YOLOv5s backbone network to propose the C3_DenseNet module to enhance feature transfer and prevent model overfitting. The DY-YOLOv5 algorithm is applied to the VisDrone 2019 dataset, and the mean average precision (mAP) reaches 43.9%, which is 11.4 percentage points higher than the original algorithm. The recall rate (Recall) is 41.7%, which is 9.0 percentage points higher than the original algorithm. Experimental results show that the improved algorithm significantly improves the accuracy of target detection in UAV aerial images.

Reference | Related Articles | Metrics

Select

Hyperspectral Image Classification Based on Double Branch Multidimensional Attention Feature Fusion

MA Yamei, WANG Shuangting, DU Weibing

Computer Engineering and Applications 2024, 60 (7): 192-203. DOI: 10.3778/j.issn.1002-8331.2211-0139

Abstract （32）

PDF（pc）（729KB）（29）

Save

To improve the classification performance of small sample classes of hyperspectral images and to enhance the robustness of the model feature representation, a neural network classification model with two-branch multidimensional attentional feature fusion (DBMD) is proposed. DBMD uses two branches for spectral feature extraction and hybrid feature extraction respectively. The spectral branch extracts features step-by-step through densely connected dilated convolution, and then fuses low, medium and high level semantic information as the feature output. The hybrid branch uses a 3D-2D network architecture and extracts spatial scale features through improved Inception blocks. In addition, the attention mechanism is applied to spectral, spatial and spatial-spectral feature extraction respectively for feature refinement and to enhance the feature response in important regions. Finally, the refined features of different dimensions are jointly input to the classifier for classification. Experiments using 5% and 1% samples on the Indian Pines and Salinas Valley datasets achieve an overall accuracy of 98.40% and 99.78% respectively, and the proposed model performs better in terms of accuracy and stability compared to the other six network architectures.

Reference | Related Articles | Metrics

Select

VR Interactive 3D Virtual Crane Modeling and Simulation

HUANG Kaige, HUI Yanbo, LIU Yonggang, WANG Hongxiao, WANG Qiao

Computer Engineering and Applications 2024, 60 (7): 204-211. DOI: 10.3778/j.issn.1002-8331.2211-0173

Abstract （20）

PDF（pc）（884KB）（17）

Save

Crane, as a widely used special equipment, is highly dangerous in operation and prone to safety accidents. In order to reduce the safety accidents caused by improper operation, the state attaches great importance to the crane safety training. The current training operations are mostly traditional demonstration training with high training costs and poor results. Virtual reality technology has the advantages of immersion, interaction and multi-perception and so on. Based on this, this study uses virtual reality technology to establish crane training and assessment system, which have greatly improved the effect of worker training. In order to restore the true use scene of the crane, first of all, reverse engineering technology is used to model the crane and crane workshop. Secondly, in view of the reality of 3D virtual scene and the poor reusability of interactive models with a large number of models, the level of detail (LOD) model is used to build the geometric model of crane, so as to optimize the realism and real-time of the system. Then, 3D scene roaming, collision detection and fast navigation of crane are realized with Unity platform. Based on MySQL database, the training data and crane important parameters data are storage in real-time. Finally, the virtual platform is validated according to the existing bridge platform. The results show that the crane virtual reality training system can greatly improve the sensory training while reducing the training cost, and the experiment has a better effect.

Reference | Related Articles | Metrics

Select

Generative Adversarial Network with Dual Discriminator and Mixed Attention

WANG Lei, YANG Jun, ZHANG Chiyu, DAI Zaiyan

Computer Engineering and Applications 2024, 60 (7): 212-221. DOI: 10.3778/j.issn.1002-8331.2211-0196

Abstract （31）

PDF（pc）（853KB）（44）

Save

In image generation tasks, how to improve the quality of generated images is a key problem. Currently, the multi-layer convolutional structure adopted by GAN has the problem of local induction bias, which cannot focus on key information, resulting in losing image features during training process. In this paper, a model of generative adversarial network with dual discriminator and mixed attention, termed as DDMA-GAN, is proposed. Firstly, DDMA-GAN designs a mixed attention mechanism, which utilizes channel attention and spatial attention to fully capture image feature information. Secondly, to solve the problem of discrimination error of single discriminator, a dual discriminator structure is proposed. The fusion coefficient is used to fuse the judgment results to make the returned parameters more objective, and the data augmentation module is embedded to further improve the robustness of the model. Finally, the hinge loss is used as loss function to maximize the distance between true and fake samples. The model is verified on public datasets LSUN and CelebA. Experimental results show that images generated by DDMA-GAN on classical datasets are more realistic. FID and MMD of DDMA-GAN are significantly reduced, which fully indicate validity of model.

Reference | Related Articles | Metrics

Select

Image Feature Classification Based on Multi-Agent Deep Reinforcement

ZHANG Zewei, ZHANG Jianxun, ZOU Hang, LI Lin, NAN Hai

Computer Engineering and Applications 2024, 60 (7): 222-228. DOI: 10.3778/j.issn.1002-8331.2211-0129

Abstract （24）

PDF（pc）（659KB）（26）

Save

In order to solve the problem of high complexity of input image data in machine learning tasks such as image feature recognition and classification, a multi-agent deep reinforcement learning method for image feature classification is proposed. Firstly, the image feature classification task is transformed into a partially observable Markov decision process. It uses multiple moving isomorphic agents to collect part of the image information, and studies how agents form local understanding of the image and take actions, and how to extract and classify relevant features from locally observed images, so as to reduce the data complexity and filter out irrelevant data. Secondly, the improved value function decomposition method is used to train the agent strategy network, and the global return of the environment is divided according to the contribution of each agent, so as to solve the reliability allocation problem of the agent. The proposed method is verified on MNIST handwritten numerals data set and NWPU-RESISC45 remote sensing image data set. Compared with the baseline algorithm, it can learn more effective association strategies, and the classification process has better stability and improved accuracy.

Reference | Related Articles | Metrics

Select

Camouflage Object Detection Algorithm Based on Edge Attention and Reverse Orientation

HE Wenhao, GE Haibo, CHENG Mengyang, AN Yu, MA Sai

Computer Engineering and Applications 2024, 60 (7): 229-237. DOI: 10.3778/j.issn.1002-8331.2211-0211

Abstract （25）

PDF（pc）（842KB）（29）

Save

Camouflage object detection (COD) has important application value in many fields. The existing COD algorithm mainly focuses on the expression of the features extracted from the backbone network and the problem of feature fusion, ignoring the problems of focusing on the edge features of the object and inferring the real area of the object. Aiming at the above problems, a camouflaged object detection algorithm based on edge attention and reverse positioning is proposed. The algorithm consists of edge attention module (EAM), close integration module (CIM) and reverse positioning module (RPM). First, the EAM module is used in the feature encoding stage to enhance the expression of multi-level features extracted from the Res2Net-50 backbone network and highlight edge features. Then, the CIM module is used for the fusion of multi-level features to reduce the loss of feature information. Finally, the RPM module is used to process the rough prediction maps from different feature pyramids, reverse localize the real region of the object, and infer the real object. Experiments on 3 public datasets show that the proposed algorithm outperforms the other 8 state-of-the-art models. On the COD10K dataset, the mean absolute error (MAE) reaches 0.038.

Reference | Related Articles | Metrics

Select

Learning Gaussian-Aware Constraint Spatial Anomaly for Correlated Filter Target Tracking

JIANG Wentao, WANG Zimin, ZHANG Shengchong

Computer Engineering and Applications 2024, 60 (7): 238-247. DOI: 10.3778/j.issn.1002-8331.2211-0408

Abstract （17）

PDF（pc）（903KB）（14）

Save

In an effort to solve the loss of target tracking in complicated movements, a target tracking algorithm with Gaussian-aware constraint space anomaly is proposed. Firstly, the feature sampling points of the target are established with Gaussian uniform distribution as the distribution law, and the appearance model and weight model of the target are extracted with convolution structure. Secondly, in an effort to constrain spatial anomaly, spatial regular terms are constructed in the target function; while at the same time the target weight model is updated to minimize the occurrence of spatial overfitting, thereby enhancing the spatial anomaly adaptability of the tracker. Lastly, the weighted least square method is applied to obtain the weight response model center, so as to determine the target center, update the tracking position, thereby enhancing the robustness of the tracker. By means of OTB2015 and UAV20L dataset, the algorithm proposed in this paper, when compared with other mainstream relevant filtering algorithms, presents high tracking success rate and tracking accuracy under such complicated circumstances as low resolution and obstruction due to target motion.

Reference | Related Articles | Metrics

Select

Hand Pose Estimation Based on Multi-Feature Enhancement

FENG Xinxin, GAO Shu

Computer Engineering and Applications 2024, 60 (6): 207-213. DOI: 10.3778/j.issn.1002-8331.2210-0089

Abstract （39）

PDF（pc）（580KB）（35）

Save

Hand pose estimation is one of the important research directions of computer vision, which plays an important role in human-computer interaction, virtual reality, robot control and other application fields. At present, hand pose estimation has the problem of single feature representation method. This paper proposes a feature construction method of hand key point connection relationship and a key point feature aggregation enhancement method based on hand motion semantic relationship to improve the hand feature representation and information sharing ability. Aiming at the occlusion problem in hand target detection and image segmentation, a hand contour feature extraction method is designed to improve the preprocessing effect. Based on the proposed multi-feature representation and enhancement method, a depth learning neural network model based on full convolution structure is constructed to avoid the problem of spatial information loss caused by direct regression calculation of 3D pose information, thus effectively improving the accuracy of 3D hand pose estimation. Compared with the SOTA model on DO, ED, RHD datasets, it has achieved a competitive effect, and the average AUC result has reached 93.3%, indicating that the proposed method also has good universality.

Reference | Related Articles | Metrics

Select

Commonsense Oriented Fine-Grained Data Augmentation

LI Huachao, KANG Bin, WANG Lei

Computer Engineering and Applications 2024, 60 (6): 214-221. DOI: 10.3778/j.issn.1002-8331.2210-0361

Abstract （28）

PDF（pc）（618KB）（24）

Save

The representative researches on data augmentation are mainly carried out on common classification benchmark datasets such as ImageNet. Considering intra-class and inter-class relation in fine-grained visual classification(FGVC) datasets is so different from ordinary classification datasets, data augmentation methods for FGVC need to be further studied. Therefore, this paper proposes a fine-grained semantic image patch mixing method by commonsense(ComSipmix), starting from the fine-grained recognition task and the special properties of the dataset. The proposed method exploits common sense knowledge to mine potential associations between sample labels, and designs a multi-branch convolutional neural network structure for structured image mixing strategy based on this, so that the image mixing process pays more attention to the subtle differences of targets. Through extensive performance tests, it can be verified that the performance of the proposed method is significantly better than the mainstream image mixing-based data augmentation methods. At the same time, through experimental verification, the common sense knowledge proposed in this paper helps to improve the performance of various data augmentation models based on mixed image classes.

Reference | Related Articles | Metrics

Select

CME-Based Few-Shot Detection Model with Enhanced Multiscale Deep Features

DING Zhengwei, BAI Hexiang, HU Shen

Computer Engineering and Applications 2024, 60 (6): 222-229. DOI: 10.3778/j.issn.1002-8331.2211-0419

Abstract （33）

PDF（pc）（614KB）（34）

Save

A CME-based few-shot detection model with enhanced multiscale deep feature is proposed to address the problems that existing few-shot detection models have insufficient consideration of global semantic information of images and degradation of detector performance due to varying input image sizes. Firstly, the model is trained with a large amount of labeled base class data and a multilayer convolutional neural network based on residual jumping and a multiscale feature enhanced module with good generalization, then the model is fine-tuned with a small amount of labeled new class data and base class data, and finally the fine-tuned model is used for target detection. To verify the effectiveness of the model, the VOC2007 and VOC2012 datasets are used to train and evaluate the model, and the relevant ablation experiments demonstrate that the introduction of a multilayer convolutional neural network with residual jump structure and the multi-scale feature enhanced module can further increase the accuracy of the model both alone and in combination. In comparison experiments with six representative small-sample target detection models, it is shown that the CME with multiscale deep feature deepening scores outperforms the state-of-the-art detector by an average of 4.75 percentage points.

Reference | Related Articles | Metrics

Select

Small Target-Oriented Multi-Space Hierarchical Helmet Detection

LI Jiaxin, HU Yang, HUANG Xiezhou, LI Hongjun

Computer Engineering and Applications 2024, 60 (6): 230-237. DOI: 10.3778/j.issn.1002-8331.2210-0353

Abstract （29）

PDF（pc）（792KB）（28）

Save

As there are factors affecting the detection effect such as small targets and distances in the target video, it is difficult to capture small targets. A multi-spatial hierarchical helmet wearing detection algorithm for small targets is proposed in the article, which will be personalized and improved on the basis of Yolov5s network model. Firstly, a multi-spatial attention module is designed to consider the effects of spatial features from different perspectives and fuse them to enhance the spatial location relationships of salient features. Secondly, features at multiple spatial scales are fused while combining multiple features in the feature extraction process to adapt to the capture of targets at different spatial levels and improve the detection of small targets. Thirdly, data augmentation is used to improve the generalizability of the dataset to adapt the training targets to more diverse scenarios. Finally, the loss function is optimized to enhance the regression capability and improve the training effect. The experimental results show that the proposed algorithm achieves an average accuracy of 91.5%, significantly reducing the number of missed detections. In addition, the proposed algorithm has been deployed to real construction sites and has shown superior performance in detecting small targets, which is of great value for application.

Reference | Related Articles | Metrics

Select

Expression Recognition Combining 3D Interactive Attention and Semantic Aggregation

WANG Guangyu, LUO Xiaoshu, XU Zhaoxing, FENG Fangyu, XU Jiangjie

Computer Engineering and Applications 2024, 60 (6): 238-248. DOI: 10.3778/j.issn.1002-8331.2210-0398

Abstract （30）

PDF（pc）（701KB）（37）

Save

A facial expression recognition method combining 3D augmented attention and semantic aggregation is proposed to address the problems that traditional convolutional networks are difficult to effectively integrate features of facial expressions of faces at different stages, have feature expression bottlenecks and cannot efficiently utilize contextual semantics. Firstly, it is optimized on the basis of rank expansion (ReXNet) network to fuse contextual features while eliminating expression bottlenecks to make it more suitable for expression recognition tasks. Secondly, to capture discriminative face expression fine-grained features, 3D augmented attention is constructed by combining non-local blocks with cross-dimensional information interaction theory. Finally, in order to fully utilize the shallow and mid-level underlying features and high-level semantic features of expressions, a semantic aggregation module is designed to aggregate multi-level global contextual features with high-level semantic information to achieve mutual semantic gain of expressions of the same class and enhance intra-class consistency. Experiments show that the accuracy of the method is 88.89%, 89.53% and 62.22% on the publicly available datasets RAF-DB, FERPlus and AffectNet-8, respectively, demonstrating the advancedness of the method.

Reference | Related Articles | Metrics

Select

Semi-Supervised Object Detection Algorithm Based on Localization Confidence Weighting

FENG Zeheng, WANG Feng

Computer Engineering and Applications 2024, 60 (6): 249-258. DOI: 10.3778/j.issn.1002-8331.2210-0400

Abstract （31）

PDF（pc）（696KB）（21）

Save

Reference | Related Articles | Metrics

Select

Wavelet Frequency Division Self-Attention Transformer Image Deraining Network

FANG Siyan, LIU Bin

Computer Engineering and Applications 2024, 60 (6): 259-273. DOI: 10.3778/j.issn.1002-8331.2211-0099

Abstract （69）

PDF（pc）（1362KB）（64）

Save

In view of the weak ability of vision Transformer (ViT) to capture high-frequency information and the problem that many image deraining methods are prone to lose details, a wavelet frequency division self-attention Transformer image deraining network (WFDST-Net) is proposed. As the main module of WFDST-Net, the wavelet frequency division self-attention Transformer (WFDST) uses non-separable lifting wavelet transform to obtain the low-frequency and high-frequency components of feature map, and carries out self-attention interaction in the low frequency and high frequency respectively, so that the module can learn from the low frequency to restore the overall structure, and strengthen the ability to capture line details such as rain streaks in the high frequency, thus enhancing the modeling ability of different frequency domain features. WFDST-Net adopts U-shaped architecture and obtains multi-scale features through non-separable lifting wavelet transform, which can capture high-frequency rain streaks of different shapes while ensuring the integrity of information. WFDST-Net has lower parameters than other Transformers related to image deraining. In addition, the VOCRain250 dataset is proposed for the task of joint image deraining and semantic segmentation, which has advantages over the currently widely used BDD150. The experimental results show that the proposed method enhances the ability of ViT to capture different frequency domain information, and outperforms the current state-of-the-art deraining methods in the performance of synthetic and real-world datasets and joint semantic segmentation tasks. It can effectively remove complex rain streaks while retaining more background details.

Reference | Related Articles | Metrics

Select

Lightweight Object Detection Method for Constrained Environments

QU Haicheng, YUAN Xudong, LI Jiaqi

Computer Engineering and Applications 2024, 60 (6): 274-281. DOI: 10.3778/j.issn.1002-8331.2211-0283

Abstract （24）

PDF（pc）（604KB）（16）

Save

The lightweight design of object detection models plays an important role in environments with limited computing resources and storage space. To further compress the size of the object detection model and improve its detection accuracy, a higher performance lightweight object detection model named Lite-YOLOX is proposed, which improves the structure of the feature pyramid, the structure of the decoupling head, and the loss function based on the YOLOX-Tiny model. Firstly, to further compress the size of the original model, the structure of the feature pyramid and decoupled head are redesigned to make the neck and head parts of the model lighter. Then, to improve the detection accuracy of the model, the EIoU loss function which is more sensitive to the position of the ground truth box is designed to optimize the proposed model. Finally, the validation experiments are performed on the Pascal VOC and safety helmet wearing dataset. The experimental results show that compared with YOLOX-Tiny, Lite-YOLOX reduces the parameters by 40%, the floating point of operations by 37.5%, and the mAP50 increases by 3.2 and 3.1 percentage points. On the NVIDIA Jetson Xavier NX, the frames per second (FPS) is increased from 51 to 59, and the real-time performance is significantly improved.

Reference | Related Articles | Metrics

Select

Multi-Object Tracking with Spatial-Temporal Embedding Perception and Multi-Task Synergistic Optimization

LIANG Xiaoguo, LI Hui, CHENG Yuanzhi, CHEN Shuangmin, LIU Hengyuan

Computer Engineering and Applications 2024, 60 (6): 282-292. DOI: 10.3778/j.issn.1002-8331.2211-0385

Abstract （30）

PDF（pc）（1097KB）（39）

Save

To solve the tracking challenges caused by frequent occlusion, crowded scenes and variable object scales in multi-object tracking, a multi-object tracking method is proposed via spatial-temporal embedding perception and multi-task synergistic optimization. Firstly, spatial correlation module is proposed to extract discriminative embedding with object context awareness in spatial. Secondly, temporal correlation module is proposed to aggregate the embedding extracted from spatial correlation module, and aggregated embedding is used to generate temporal attention to guide spatial correlation module to extract more discriminative embedding in frequent occlusion and crowded scenes. Therefore, discriminative embedding enhances association robustness while predicting more accurate detection box to overcome the scale variability issues, and accurate detection box facilitates the extraction of higher quality embedding for the proposed modules. In this way, the synergistic optimization among multiple tasks of embedding extraction, position prediction and data association is achieved. Finally, GIoU distance among detection boxes is introduced into the affinity matrix to further improve association robustness in occlusion and crowded scenes. Experimental results on MOT16, MOT17 and MOT20 datasets show that the proposed method exhibits superior tracking performance to state-of-the-art methods.

Reference | Related Articles | Metrics

Select

Research on Pedestrian Multi-Object Tracking Algorithm Under OMC Framework

HE Yuting, CHE Jin, WU Jinman, MA Pengsen

Computer Engineering and Applications 2024, 60 (5): 172-182. DOI: 10.3778/j.issn.1002-8331.2211-0344

Abstract （44）

PDF（pc）（859KB）（32）

Save

Multi-object tracking is an important direction that has been widely studied in the field of computer vision, but in practical applications, the rapid movement of targets, lighting changes, and occlusions can lead to poor tracking performance, therefore, the multi-object tracking model OMC is used as the basic framework to carry out research to achieve further improvement of tracking performance. Firstly, to address the problem of uneven quality of target features in multi-object tracking, the feature extractor is optimized by integrating the GAM attention mechanism in the backbone network and replacing the upsampling method in the Neck network part. Secondly, to address the “competition problem” between detection and re-identification tasks in existing methods, a recursive cross-correlation network is constructed so that the model can learn the characteristics and commonalities of different tasks. Here, two sub-tasks are optimized separately, on the one hand, a new channel attention HS-CAM is designed to optimize the re-identification network;on the other hand, the boundary regression loss function of the detection part is replaced and the EIoU loss function is adopted. Experiments show that MOTA metrics can reach 73.5%, IDF1 can reach 70.4%, and MLgt is 11.7% on MOT16 dataset, which is 1.5 percentage points reduction compared to OMC algorithm.

Reference | Related Articles | Metrics

Select

UAV Small Object Detection Algorithm Based on Context Information and Feature Refinement

PENG Yanfei, ZHAO Tao, CHEN Yankang, YUAN Xiaolong

Computer Engineering and Applications 2024, 60 (5): 183-190. DOI: 10.3778/j.issn.1002-8331.2305-0401

Abstract （77）

PDF（pc）（661KB）（91）

Save

Object detection in UAV aerial images is a research hotspot in recent years, aiming at the problem of low detection accuracy caused by small and dense objects and complex background from the perspective of UAV, a UAV small object detection algorithm based on context information and feature refinement is proposed. Firstly, through the context feature enhancement module, the multi-scale dilated convolution is used to capture the potential relationship with the pixels in the surrounding area, which complements the context information of the network. According to the feature layers of different scales, the output weights of each level of feature maps are adaptively generated, and the expression ability of the feature map is dynamically optimized. Secondly, due to different fineness of different feature maps, the feature refinement module is used to suppress the interference of conflict information in feature fusion to prevent the small object features from drowning in conflict information. Finally, a weighted loss function is designed to accelerate the convergence speed of the model and further improve the accuracy of small object detection. Extensive experiments on the VisDrone2021 dataset show that the improved model improves 8.4 percentage points over the benchmark model mAP50, 5.9 percentage points over mAP50：95, and the FPS is 42, which effectively improves the detection accuracy of small objects in UAV aerial images.

Reference | Related Articles | Metrics

Select

Re-Parameterized YOLOv8 Pavement Disease Detection Algorithm

WANG Haiqun, WANG Bingnan, GE Chao

Computer Engineering and Applications 2024, 60 (5): 191-199. DOI: 10.3778/j.issn.1002-8331.2309-0354

Abstract （73）

PDF（pc）（592KB）（99）

Save

Road disease detection is an important way to ensure people’s traffic safety. In order to improve the accuracy of road disease detection and achieve timely and accurate road disease detection, a pavement disease detection model of re-parameterized YOLOv8 is proposed. First of all, CNX2f module is introduced into the backbone network to improve the ability of the network to extract pavement disease features, and effectively solve the problem that the pavement disease features are easily confused with the background environmental features. Secondly, RepConv and DBB reparameterization modules are introduced to enhance the capability of multi-scale feature fusion and solve the problem of large scale difference of pavement diseases. At the same time, the shared parameter structure of the head is improved, and RBB reparameterization module is introduced to solve the problem of head parameter redundancy and improve the feature extraction capability. Finally, the SPPF_Avg module is introduced to solve the problem of pavement feature loss and enrich the multi-scale feature expression. The experimental results show that the accuracy of the improved road disease detection network is 73.3%, the recall rate is 62.3% and the mAP is 69.3%, which is 2.6, 3.0 and 2.8 percentage points higher than that of the YOLOv8 network, and the detection effect of the model is improved.

Reference | Related Articles | Metrics

Select

Traffic Sign Detection Algorithm Based on Improved YOLOv5-S

LIU Haibin, ZHANG Youbing, ZHOU Kui, ZHANG Yufeng, LYU Sheng

Computer Engineering and Applications 2024, 60 (5): 200-209. DOI: 10.3778/j.issn.1002-8331.2306-0293

Abstract （89）

PDF（pc）（689KB）（110）

Save

In the field of autonomous driving, existing traffic sign detection methods have problems with missed or incorrect sign detection in complex backgrounds, reducing the reliability of intelligent vehicles. To address this issue, a real-time traffic sign detection algorithm is proposed to enhance YOLOv5-S. Firstly, the coordinate attention mechanism is integrated into the feature extraction network to perceive the location of the object by establishing long-term dependencies on the target, making the algorithm focus on high-priority regions. Secondly, the Focal-EIoU loss function is used to replace the CIoU, allowing the network to focus more on high-quality classification samples, improving the network’s ability to learn from difficult samples and reducing the occurrence of missed or false detections. Next, the lightweight convolution technique GSConv is integrated into the network to reduce the complexity of the model. Finally, a new small target detection layer is added to improve the algorithm’s detection of small-sized signs by using richer feature information. The experimental results show that the improves algorithm achieves 88.1% for mAP@0.5 and 68.5% for mAP@0.5：0.95, with a detection speed of 83 FPS, which can meet the requirements of real-time and reliable detection.

Reference | Related Articles | Metrics

Select

Multi-Coupled Feedback Networks for Image Fusion and Super-Resolution Methods

WANG Rong, DUANMU Chunjiang

Computer Engineering and Applications 2024, 60 (5): 210-220. DOI: 10.3778/j.issn.1002-8331.2212-0118

Abstract （34）

PDF（pc）（697KB）（24）

Save

People often need to obtain high dynamic range and high resolution images in their daily life. However, due to the limitation of technical equipment, high dynamic range images are often obtained by multi-exposure fusion (MEF) of low dynamic range images, and high resolution images are often obtained by super resolution (SR) of low resolution images. MEF and SR are usually studied as two separate elements. In order to solve the problem that the current model cannot achieve high dynamic range and high resolution at the same time, a multi-coupling feedback network (MCF-Net) and its method are proposed in this paper through the study of existing methods. The model includes：[N] subnets and output modules; in the method, first, [N] downsampled images [Iilr,Imlr,I-ilr] are input to [N] subnets respectively, and the extracted low-resolution features [Filr,Fmlr,F-ilr]; then the super-resolution features [Gi0,Gm0,G-i0] of the corresponding images are extracted according to the low-resolution features; the fused high-resolution features [Git,Gmt,G-it] are obtained and input to the next MCFB until the T-th MCFB obtains the fused high-resolution features [GiT,GmT,G-iT]; then the corresponding fused super-resolution image [Iit,Imt,I-it] is obtained; finally the high dynamic range, super-resolution image [Iout] is obtained by fusing the output [IiT,ImT,I-iT] of the [T]-th reconstruction module REC in [N] subnets. In this paper, the performance is experimented and verified on the SICE dataset, and compared with 33 existing methods, the results show that each of the following evaluation indexes has been significantly improved, including the structural similarity (SSIM) reaching 0.833 2, the peak signal-to-noise ratio (PSNR) reaching 22.07 dB, and the multi-exposure fusion similarity (MEF-SSIM) reaching 0.937 8.

Reference | Related Articles | Metrics

Select

Image Super-Resolution Reconstruction Algorithm with Adaptive Aggregation of Hierarchical Information

CHEN Weijie, HUANG Guoheng, MO Fei, LIN Junyu

Computer Engineering and Applications 2024, 60 (5): 221-231. DOI: 10.3778/j.issn.1002-8331.2210-0155

Abstract （34）

PDF（pc）（688KB）（44）

Save

With the development of convolutional neural networks, image super-resolution reconstruction algorithms have made some breakthroughs. Nevertheless, the existing image super-resolution algorithms rarely distinguish the use of hierarchical features and suffer from the problem of costly multi-scale feature extraction. To address these problems, this paper proposes an image super-resolution reconstruction algorithm with adaptive aggregation of hierarchical information. Specifically, the algorithm applies a multi-level information refinement mechanism for the adaptive enhancement of features at different levels to solve the problem that the hierarchical features are not distinguishably utilized. In addition, it is proposed to construct a fine-grained multi-scale information aggregation block to solve the problem of costly multi-scale information extraction and poor feature representation capability. Finally, the algorithm focuses on contrast-enhanced recombinant attention blocks to achieve adaptive calibration of features at a lower cost by exploiting channel and spatial information. Extensive experiments show that compared with other advanced algorithms, the proposed method can achieve better metrics and visual results on five benchmark datasets such as Urban100.

Reference | Related Articles | Metrics

Select

Improved UNet++ for Tree Rings Segmentation of Chinese Fir CT Images

LIU Shuai, GE Zhedong, LIU Xiaotong, GAO Yisheng, LI Yang, LI Mengfei

Computer Engineering and Applications 2024, 60 (5): 232-239. DOI: 10.3778/j.issn.1002-8331.2210-0212

Abstract （51）

PDF（pc）（894KB）（63）

Save

In order to solve the problem that it is difficult to accurately segment tree rings with defects such as cracks, wormholes and knots. The medical CT is used as experimental equipment to reconstruct 125 CT images of Chinese fir transverse sections, and these images are used as the data set. Data set is expanded by pre-processing such as cutting, rotating and flipping CT images. An improved UNet++ model is proposed for tree rings segmentation. Convolutional blocks, downsampling layers, skip connections and upsampling layers have been added to the improved UNet++ model, and the learning depth is increased to 6 layers. The BCEWithLogitsLoss, ReLU and RMSProp are used as loss function, activation function and optimization function respectively. The improved UNet++ model is used to segment the tree rings of the transverse sections of Chinese fir reconstructed by CT, and the performance of the model is evaluated. The results show that the pixel accuracy of the improved UNet++ model is 97.81%, the dice coefficient is 98.89%, the intersection over union is 95.29%, and the mean intersection over union is 84.75%. The best segmentation result is obtained by fully extracting the characteristics in Chinese fir tree rings. Compared with the U-Net model and the UNet++ model, the improved UNet++ model makes the segmented tree rings complete and continuous, although most tree rings are cut by cracks and wormholes and cannot form a complete circular closed curve, fracture and noise are eliminated. The results show that the improved UNet++ model is not affected by defects such as cracks, knots and wormholes, and the segmentation results are very clear, which effectively solves mis-segmentation and under segmentation of dense tree rings under the interference of wormhole defects.

Reference | Related Articles | Metrics

Select

Facial Expression Generation Based on Group Residual Block Generative Adversarial Netxwork

LIN Benwang, ZHAO Guangzhe, WANG Xueping, LI Hao

Computer Engineering and Applications 2024, 60 (5): 240-249. DOI: 10.3778/j.issn.1002-8331.2210-0234

Abstract （31）

PDF（pc）（983KB）（31）

Save

Facial expression generation is the generation of facial images with expressions through a certain expression calculation method, which is widely used in face editing, film and television production, and data augmentation. With the advent of generative adversarial network (GAN), facial expression generation has made significant progress, but problems such as overlapping, blurring, and lack of realism still occur in facial expression generation images. In order to address the above issues, group residuals with attention mechanism generative adversarial network (GRA-GAN) is proposed to generate high-quality facial expressions. Firstly, an adaptive mixed attention mechanism (MAT) is embedded in the generative network before downsampling and after upsampling to adaptively learn the key region features and enhance the learning of key regions of the image. Secondly, the idea of grouping is integrated into the residual network, and the group residuals block with attention mechanism (GRA) module is proposed to achieve better generation effect. Finally, the experimental verification is carried out on the public dataset RaFD. The experimental results show that the proposed GRA-GAN outperforms the related methods in both qualitative and quantitative analysis.

Reference | Related Articles | Metrics

Select

Improving YOLOX-s Dense Garbage Detection Method

XIE Ruobing, LI Maojun, LI Yiwei, HU Jianwen

Computer Engineering and Applications 2024, 60 (5): 250-258. DOI: 10.3778/j.issn.1002-8331.2210-0235

Abstract （48）

PDF（pc）（837KB）（47）

Save

To address the problems of low recognition rate, inaccurate localization and false detection and omission of targets to be detected in densely stacked multi-species garbage detection, a garbage detection method in corporating multi-headed self-attention mechanism to improve YOLOX-s is proposed. Firstly, the Swin Transformer module is embedded in the feature extraction network, and the multi-headed self-attention mechanism based on the sliding window operation is introduced to make the network take into account the global feature information and the key feature information to reduce the false detection phenomenon. Secondly, the deformable convolution is used in the prediction output network to refine the initial prediction frame and improve the localization accuracy. Finally, on the basis of the EIoU, loss weighting coefficients are introduced to propose a weighted IoU-EIoU loss, which adaptively adjusts the degree of concern for different losses at different stages of training to further accelerate the convergence of the training network. Testing on a public 204-class spam detection dataset, the results show that the average mean accuracy of the propose improve algorithm can reach 80.5% and 92.5%, respectively, which is better than the current popular target detection algorithms, and the detection speed is fast to meet the real-time requirements.

Reference | Related Articles | Metrics

Select

Ship Target Detection Method Combining Visual Saliency and EfficientNetV2

LIANG Xiuya, FENG Shuichun, CHEN Hongzhen

Computer Engineering and Applications 2024, 60 (5): 259-270. DOI: 10.3778/j.issn.1002-8331.2210-0267

Abstract （49）

PDF（pc）（1320KB）（42）

Save

With the increasing resolution of optical remote sensing images, fast and accurate detection of ship targets on the sea has become one of the basic challenges of maritime research. In order to solve the problems faced in the detection process, such as large image size but sparse targets, complex background interference, poor timeliness of target extraction, and large calculation of model volume, a practical ship detection scheme is proposed. Visual saliency is introduced to effectively accelerate the pre-screening process, and the difference between the ship target area and the background is effectively expressed by wavelet decomposition coefficients, which can enhance the target directional characteristics while suppressing noise. Saliency map is generated through the improved model based on phase spectrum of quaternion Fourier transform (PQFT). In addition, Gini index is exploited to guide multi-scale saliency image fusion to enhance image scale adaptability and small target saliency. Comparing with other saliency methods, the proposes model can effectively suppress the interference of complex environments such as cloud, fog, sea clutter, and ship wake. More importantly, it produces a smaller set of candidate regions than the classical sliding window or other region recommendation methods. After the saliency map is obtained, the adaptive threshold OTSU method is employed for binary segmentation of saliency map. In the target discrimination stage, the lightweight network EfficientNetV2 is exploited to effectively eliminate false alarms. The experimental results show that the proposes ship detection method has high robustness and accuracy up to 96%, meeting the real-time requirements.

Reference | Related Articles | Metrics

Select

Dual-Branch Low-Light Image Enhancement Combined with Dense Wavelet Transform

CHEN Junjie, ZHOU Yongxia, ZU Jiazhen, SHEN Wei, ZHAO Ping

Computer Engineering and Applications 2024, 60 (4): 200-210. DOI: 10.3778/j.issn.1002-8331.2209-0470

Abstract （41）

PDF（pc）（3662KB）（41）

Save

A dual-branch image enhancement method combining dense wavelet transform is proposed to solve the problems of low brightness, high noise, and color distortion in low-light images. Firstly, dense wavelet networks are used for multi-scale feature information fusion to reduce information loss and provide denoising capability. Then, the global attention module and feature extraction module are embedded in the multi-scale feature fusion to fully extract global and local features. Finally, the effect of low-light images is enhanced by color enhancement and detail reconstruction with a dual-branch structure. In addition, a new joint loss function is introduced to guide the network training from multiple aspects to enhance its performance. The experimental results show that the proposed method effectively improves the brightness of low-light images, suppresses image noise, and obtains richer details and color information. The enhanced images are clearer and more natural, and the peak signal-to-noise ratio and structural similarity have significant advantages over the mainstream methods.

Reference | Related Articles | Metrics

Select

Vectorized Feature Space Embedded Clustering Based on Contrastive Learning

ZHENG Yang, WU Yongming, XU An

Computer Engineering and Applications 2024, 60 (4): 211-219. DOI: 10.3778/j.issn.1002-8331.2209-0338

Abstract （41）

PDF（pc）（2323KB）（37）

Save

The deep embedding clustering (DEC) algorithm only embeds data into a low-dimensional vectorized feature space by autoencoder with a single instance reconstruction for clustering, and ignores the relationship between different instances, which leads to the instances in the embedding space may not be well distinguished from each other. To address the above problems, vectorized feature space embedded clustering based on contrastive learning (VECCL) method is proposed. By contrastive learning to identify the dissimilarity between data instances in a way, features with homogeneous near and different far clustering semantics are extracted from the data and brought into DEC as prior knowledge to guide the autoencoder to initialize a low-dimensional clustering feature space with deep data information. At the same time, the entropy loss constructed by the soft classification label and the reconstruction loss of the autoencoder are introduced into the clustering loss function as a regularization term to jointly refine the clustering. Compared with the experimental results of DEC method on datasets CIFAR10, CIFAR100 and STL10, ACC increaseds by 48.1, 23.1 and 41.8 percentage points, NMI increaseds by 41.0, 25.2 and 39.0 percentage points, and ARI increaseds by 45.4, 16.4 and 41.8 percentage points, respectively.

Reference | Related Articles | Metrics

Select

Moving Object Detection Algorithm with Unsupervised Missing Value Prediction

FU Rao, FANG Jiandong, ZHAO Yudong

Computer Engineering and Applications 2024, 60 (4): 220-228. DOI: 10.3778/j.issn.1002-8331.2210-0027

Abstract （56）

PDF（pc）（2464KB）（44）

Save

In the process of moving target detection, the background is complex and the target is easily occluded. This paper proposes an autonomous detection algorithm for moving targets based on unsupervised missing value prediction. The missed targets are regarded as missing values in the tag data. According to the prior knowledge of the category and number of objects to be detected, the unsupervised generative adversarial imputation networks (GAIN) are used to predict the missing values through the acquired tag data, which greatly improves the recall rate at the expense of less accuracy. The experimental results on the small sample dataset of the characteristic parts of cattle show when the missing rate of tag data is less than 40%, the accuracy of missing value prediction is about 95%, and the average F1 score of detection is 0.92 for different degrees of occluded targets. This method has good detection performance for moving targets under the condition of small samples, which can reduce the uncertainty in practical application and the dependence of the algorithm on sample data, and improve the problem of missing detection in the process of moving target detection.

Reference | Related Articles | Metrics

Select

Multi-Scale Detail Enhanced Pyramid Network for Esophageal Lesion Detection

LI Chi, ZHOU Yingyue, YAO Hanmin, LI Xiaoxia, QIN Jiamin, ZHUANG Ming, WEN Liming

Computer Engineering and Applications 2024, 60 (4): 229-236. DOI: 10.3778/j.issn.1002-8331.2209-0162

Abstract （41）

PDF（pc）（2317KB）（34）

Save

Aiming at problems such as high interclass similarity and large intraclass scale changes in Lugol??s chromoendoscopy (LCE) images, this paper proposes a method for the detection of multiple esophageal diseases, which is based on Sparse R-CNN and equipped with multi-scale detail enhancement pyramid network (MDEPN) structure. In order to solve the problems of information loss and semantic difference in feature pyramid network (FPN) structure of Sparse R-CNN, the MDEPN structure firstly uses Gabor modulated convolution module to enhance the features of different scales, and uses Gabor??s strong attention to direction and scale to improve the expression ability of texture information in the feature map. Secondly, the directional channel pooling module is used to extract the local directional similarity and the correlation between local and global features of different scale features, so as to reduce the semantic differences in the fusion of different scale features. After testing on a self-built dataset of multiple esophageal LCE lesions, the accuracy of mAP0.50 is 65.0%, 2.4 percentage points higher than that of the benchmark model Sparse R-CNN, and higher than other major detection methods. In addition, the designed MDEPN module can be integrated into other detection models as an independent structure to improve performance, and has certain versatility.

Reference | Related Articles | Metrics

Select

Photovoltaic Panel Segmentation Using Attention Mechanism and Global Convolution

LI Qing, LI Haitao, LI Hui, ZHANG Junhu

Computer Engineering and Applications 2024, 60 (4): 237-248. DOI: 10.3778/j.issn.1002-8331.2209-0180

Abstract （35）

PDF（pc）（2931KB）（42）

Save

Accurate photovoltaic (PV) identification is critical for the effective and healthy development of PV industry. PV recognition is hampered by the complex background and variable shape and color of PV panels in high-resolution remote sensing images. This paper proposes a method for accurately extracting photovoltaic land from high-resolution remote sensing images. The encoder and decoder functions in this network combine multi-layer features to combine rich semantic data. Important spatial and channel properties are captured using the global convolution and the dual attention mechanism, while some lost channel data are recovered using the channel fusion module. The proposed method can effectively solve the problems of photovoltaic panel blurred edges and adhesion. Experiments on open PV datasets show that the IoU of the proposed method in PV01, PV03, and PV08 is 87.02%, 92.98%, and 88.43%, respectively, when compared to U-Net, SegNet, DeepLabv3, and DeepLabv3+. Experimental results show that the proposed method can achieve high accuracy segmentation of photovoltaic panels in high-resolution remote sensing images.

Reference | Related Articles | Metrics

Select

Aerial Image Object Detection with Feature Enhancement Using Hybrid Attention

GUAN Wenqing, ZHOU Shibin, ZHANG Guopeng

Computer Engineering and Applications 2024, 60 (4): 249-257. DOI: 10.3778/j.issn.1002-8331.2209-0206

Abstract （77）

PDF（pc）（3037KB）（87）

Save

Aiming at the characteristics of complex background, dense distribution and large scale variation in aerial images, this paper proposes a novel object detection framework named as hybrid attention network (HA-Net). Firstly, Transformer structure both with local and global attention in the backbone network is designed to enhance dense targets feature extraction ability. The Transformer structure uses attention to suppress background noises and make dense target boundaries clearer. Then, a spatial pyramid pooling block using continuous AvgPooling and MaxPooling is adopted to enrich feature information and enhance the multi-scale target representation. Moreover, a feature reconstruction module mixing cross-scale spatial attention and non-local channel attention is designed to reconstruct the feature pyramid network, so as to reduce unnecessary information interference and facilitate multi-scale target detection. The network is evaluated on a large remote sensing dataset DOTA, and the evaluation mAP reaches 76.81% and 78.28% on single-scale test and multi-scale test respectively, which surpasses the baseline model by a large margin of 2.38 percentage points and 3.62 percentage points. The evaluation mAP reaches 89.95% on HRSC2016. The improvement of detection results proves the effectiveness of HA-Net in aerial image object detection.

Reference | Related Articles | Metrics

Select

Unsupervised Landscape Painting Style Transfer Network with Multiscale Semantic Information

ZHOU Yuechuan, ZHANG Jianxun, DONG Wenxin, GAO Linfeng, NI Jinyuan

Computer Engineering and Applications 2024, 60 (4): 258-269. DOI: 10.3778/j.issn.1002-8331.2210-0079

Abstract （34）

PDF（pc）（3904KB）（27）

Save

This paper proposes CCME-GAN (circulatory correction multiscale evaluation-generative adversarial networks) based on the cycle consistency loss, aiming at the problems of texture clutter and poor quality of generated images when the generative adversarial network of image conversion class is dealing with the task of unsupervised style transfer. Firstly, in the design of the network architecture, a multiscale evaluation network architecture based on the three-layer semantic information of images is proposed to enhance the transfer effect from the source domain to the target domain. Secondly, in the improvement of the loss function, a multiscale adversarial loss and a cyclic correction loss are proposed to guide the optimization iteration direction of the model with a stricter target, and generate pictures with better visual quality. Finally, in order to prevent the problem of pattern collapse, this paper adds an attention mechanism in the encoding stage of style features to extract important feature information, and then introduces the ACON activation function in each stage of the network to strengthen the nonlinear expression ability of the network and avoid neuron necrosis. The experimental results show that the FID value of this paper method is reduced by 21.80% and 34.33% compared with CycleGAN and ACL-GAN on the landscape painting style migration dataset. In addition, in order to verify the generalization ability of the model, the generalization experiments are compared on two public datasets, Vangogh2Photo, and Monet2Photo and the FID values are decreased by 7.58%, 18.14% and 4.65%, 6.99% respectively.

Reference | Related Articles | Metrics

Select

Multi-Scale Liver Tumor Segmentation Algorithm by Fusing Convolution and Transformer

CHEN Lifang, LUO Shiyong

Computer Engineering and Applications 2024, 60 (4): 270-279. DOI: 10.3778/j.issn.1002-8331.2210-0084

Abstract （84）

PDF（pc）（2544KB）（62）

Save

Accurate automatic segmentation methods for liver and liver tumors are important in helping physicians to diagnose, treat, and observe liver cancer in the postoperative period. Due to the intrinsic locality of convolution, existing convolution-based methods are difficult to establish long-range dependencies. Transformer??s cascading attention mechanism can establish global information association but will destroy local details. Based on this, a feature modeling method that fuses convolution and Transformer is proposed. The method interactively fuses local and global representations by mixed embedding to maximize the global dependencies at different resolutions. Meanwhile, the contextual information from different encoding stages is captured by multi-level feature fusion module at the skip connection to obtain richer semantic information. Finally, in order to cope with the variation of liver tumors in size and shape, a deformable multi-scale module is used to extract multi-scale features of tumors. The experiments mainly use Dice similarity coefficient (DSC) as evaluation metrics. The DSCs of liver and tumor on the LiTS17 dataset are 0.920 and 0.748, respectively, and the results show that the proposed network has more accurate liver tumor segmentation results compared to the baseline.

Reference | Related Articles | Metrics

Select

Underwater Image Enhancement Based on Parallel Guidance of Transformer and CNN

CHANG Jian, CHEN Hongfu, WANG Bingbing

Computer Engineering and Applications 2024, 60 (4): 280-288. DOI: 10.3778/j.issn.1002-8331.2302-0036

Abstract （47）

PDF（pc）（2540KB）（39）

Save

To overcome the problems of low contrast and color deviation in underwater images, a parallel guided underwater image enhancement algorithm based on Transformer and convolutional neural networks (CNN) is proposed. Using a 3D position embedding model to provide Transformer with relative position information, color deviation information, and global features of feature maps, using a CNN encoder to extract local features of the image, integrating the global features extracted by Transformer and the local features extracted by CNN through a feature modulation matrix, improving the resolution of the image through a CNN decoder, and inputting the feature maps output by the decoder into a feature enhancement network, enhance the network output with features to obtain the final result. Using the existing EUVP paired dataset for training, to verify the superiority of the algorithm, underwater images with varying degrees of color deviation are selected for qualitative and quantitative experiments. The results show that the enhanced underwater image peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM) are higher than other comparison algorithms, and the subjective quality is significantly improved, the proposed algorithm can generate enhanced images with rich colors and high clarity.

Reference | Related Articles | Metrics

Select

Improved YOLOv5 Mixed Sample Training for Detection of Insulator Umbrella Plate Falling Defects

LI Xun, GAN Rundong, QIAN Junfeng, ZHANG Shiheng, ZHAO Wenbin, WANG Daolei

Computer Engineering and Applications 2024, 60 (4): 289-297. DOI: 10.3778/j.issn.1002-8331.2302-0165

Abstract （57）

PDF（pc）（2478KB）（49）

Save

In order to realize the accurate location and identification of insulator string and umbrella plate falling defects during transmission line inspection, this paper proposes an insulator defect detection model based on improved YOLOv5 mixed sample training. Firstly, aiming at the scarcity of insulator defect images, a hybrid sample data generation method is proposed, which combines GrabCut algorithm with image fusion technology to expand the data set. Then, according to the shape characteristics of insulators and defects, the long edge definition method and CSL (circular smooth label) are used to redefine the coordinate parameters of the model feature extraction area. By adding angle information, more accurate feature extraction is realized. Finally, the CSPDarkNet backbone network is optimized by fusing some feature layers in the Backbone with the features extracted by PAN (path aggregation network). The improved YOLOv5 CSPDarkNet model increases the detection accuracy of insulator defects by 2.8 percentage points compared with the improved model, and the detection rate is 20.5 FPS. The experimental results show that the improved insulator defect identification method basically meets the needs of practical application.

Reference | Related Articles | Metrics

Select

Class Incremental Learning by Adaptive Feature Consolidation with Parameter Optimization

XU An, WU Yongming, ZHENG Yang

Computer Engineering and Applications 2024, 60 (3): 220-227. DOI: 10.3778/j.issn.1002-8331.2208-0469

Abstract （35）

PDF（pc）（719KB）（32）

Save

Aiming at the catastrophic forgetting problem generated by deep network models for picture classification tasks in incremental scenarios, a class incremental learning method with adaptive feature consolidation and weight selection is proposed. Firstly, the method uses knowledge distillation as the basic framework to integrate the output features of the backbone and classification networks of the before and after task models, and uses distillation constraints with custom disparity loss to make the current model have the generalization ability of the old historical model. In the incremental learning phase, the importance of the neural network model parameters is evaluated, while changes in important parameters are penalized when learning a new task, thus effectively preventing the new model from overwriting important knowledge related to the previous task. The experimental results show that the proposed method can explore the incremental learning ability of the model and effectively alleviate catastrophic forgetting.

Reference | Related Articles | Metrics

Select

Improved FCENet Algorithm for Natural Scene Text Detection

ZHOU Yan, LIAO Junwei, LIU Xiangyu, ZHOU Yuexia, ZENG Fanzhi

Computer Engineering and Applications 2024, 60 (3): 228-236. DOI: 10.3778/j.issn.1002-8331.2209-0043

Abstract （31）

PDF（pc）（877KB）（26）

Save

Aiming at the detection problems caused by complex background, variable scale and curved shape in natural scene text detection, this paper proposes an improved FCENet (Fourier contour embedding network) scene text detection algorithm. The algorithm is based on FCENet and introduces a multi-scale residual feature enhancement module and a multi-scale attention feature fusion module. As the residual branch at the top of the backbone network, the multi-scale residual feature enhancement module enhances the high-level semantic information flow from top to bottom of the feature pyramid structure, improves the text pixel classification ability, and effectively reduces the false detection phenomenon. The multi-scale attention feature fusion module enables features of different semantics and scales to be better fused. Combined with the bottom-up feature fusion network, it effectively avoids text over-segmentation and improves the detection ability of curved text. Experimental results show that the comprehensive index F-measure of the proposed method on the curved text datasets CTW1500 and Total-Text reaches 86.2% and 86.5%, respectively, which is 1.1 and 0.7 percentage points higher than the original algorithm FCENet.

Reference | Related Articles | Metrics

Content of Graphics and Image Processing in our journal