Computer Engineering and Applications

Select

Review of SLAM Based on Lidar

LIU Mingzhe, XU Guanghui, TANG Tang, QIAN Xiaojian, GENG Ming

Computer Engineering and Applications 2024, 60 (1): 1-14. DOI: 10.3778/j.issn.1002-8331.2308-0455

Abstract （964）

PDF（pc）（854KB）（651）

Save

Simultaneous localization and mapping (SLAM) is a crucial technology for autonomous mobile robots and autonomous driving systems, with a laser scanner (also known as lidar) playing a vital role as a supporting sensor for SLAM algorithms. This article provides a comprehensive review of lidar-based SLAM algorithms. Firstly, it introduces the overall framework of lidar-based SLAM, providing detailed explanations of the functions of the front-end odometry, back-end optimization, loop closure detection, and map building modules, along with a summary of the algorithms used. Secondly, it presents descriptions and summaries of representative open-source algorithms in a sequential order of 2D to 3D and single-sensor to multi-sensor fusion. Additionally, it discusses commonly used open-source datasets, precision evaluation metrics, and evaluation tools. Lastly, it offers an outlook on the development trends of lidar-based SLAM technology from four dimensions: deep learning, multi-sensor fusion, multi-robot collaboration, and robustness research.

Reference | Related Articles | Metrics

Select

Study on Optimization of Cooperative Distribution Path Between UAVs and Vehicles Under Rural E-Commerce Logistics

XU Ling, YANG Linchao, ZHU Wenxing, ZHONG Shaojun

Computer Engineering and Applications 2024, 60 (1): 310-318. DOI: 10.3778/j.issn.1002-8331.2306-0115

Abstract （875）

PDF（pc）（666KB）（732）

Save

Drone delivery has emerged as a significant solution to address the challenges of last-mile logistics. The collaborative delivery model between drones and vehicles overcomes the limitations of insufficient drone delivery capacity and enhances safety, making it a vital approach for drone involvement in the delivery process. To tackle the difficulties and high costs associated with “last-mile” delivery in rural e-commerce logistics, this study constructs a mixed-integer programming model. The objective is to minimize delivery costs while considering constraints such as the collaborative drone-vehicle mode and multi drone multi-parcel delivery. A two-stage algorithm is proposed to optimize the paths for drone-vehicle collaborative delivery. In the first stage, a constrained adaptive K-means algorithm is utilized to determine the range of vehicle docking points. In the second stage, an improved genetic algorithm that incorporates hill climbing and splitting operators is employed to identify the optimal delivery paths for drones and vehicles. Subsequently, a case study experiment is conducted to validate the feasibility and effectiveness of the model and algorithm. The research findings are expected to offer novel insights and valuable references for cost reduction and efficiency improvement in last-mile delivery for rural e-commerce logistics.

Reference | Related Articles | Metrics

Select

Survey of Sentiment Analysis Algorithms Based on Multimodal Fusion

GUO Xu, Mairidan Wushouer, Gulanbaier Tuerhong

Computer Engineering and Applications 2024, 60 (2): 1-18. DOI: 10.3778/j.issn.1002-8331.2305-0439

Abstract （682）

PDF（pc）（954KB）（490）

Save

Sentiment analysis is an emerging technology that aims to explore people’s attitudes toward entities and can be applied to various domains and scenarios, such as product evaluation analysis, public opinion analysis, mental health analysis and risk assessment. Traditional sentiment analysis models focus on text content, yet some special forms of expression, such as sarcasm and hyperbole, are difficult to detect through text. As technology continues to advance, people can now express their opinions and feelings through multiple channels such as audio, images and videos, so sentiment analysis is shifting to multimodality, which brings new opportunities for sentiment analysis. Multimodal sentiment analysis contains rich visual and auditory information in addition to textual information, and the implied sentiment polarity (positive, neutral, negative) can be inferred more accurately using fusion analysis. The main challenge of multimodal sentiment analysis is the integration of cross-modal sentiment information; therefore, this paper focuses on the framework and characteristics of different fusion methods and describes the popular fusion algorithms in recent years, and discusses the current multimodal sentiment analysis in small sample scenarios, in addition to the current development status, common datasets, feature extraction algorithms, application areas and challenges. It is expected that this review will help researchers understand the current state of research in the field of multimodal sentiment analysis and be inspired to develop more effective models.

Reference | Related Articles | Metrics

Select

Research on Urban Logistics Distribution Mode of Bus-Assisted Drones

PENG Yong, REN Zhi

Computer Engineering and Applications 2024, 60 (7): 335-343. DOI: 10.3778/j.issn.1002-8331.2212-0252

Abstract （678）

PDF（pc）（755KB）（563）

Save

The rapid development of e-commerce forces the continuous transformation and upgrading of the logistics industry. In view of the fact that local governments encourage the development of public transport and advocate green and low-carbon logistics distribution mode, a distribution mode of bus-assisted drone is studied. After explaining the problem, a mathematical model with the lowest distribution cost is constructed, and a heuristic algorithm of smart general variable neighborhood search metaheuristic is designed to solve the problem. At the same time, in order to improve the efficiency of the algorithm, K-means clustering and greedy algorithm are introduced to generate the initial solution. Firstly, aiming at different scale examples, a variety of local search strategies and a variety of algorithms are compared to verify the effectiveness of the algorithm. Secondly, by selecting the standard CVRP as example, the single truck distribution mode and truck-drone collaborative distribution mode are compared with the distribution mode of bus-assisted drone to prove its cost and time advantages. Finally, Beijing Bus Rapid Transit Line 2 and its surrounding customer points are selected, and sensitivity analysis is made by changing the bus stop spacing and departure interval, result shows that the impact of increasing the stop spacing is greater than the change of departure interval.

Reference | Related Articles | Metrics

Select

Improved YOLOv8s Model for Small Object Detection from Perspective of Drones

PAN Wei, WEI Chao, QIAN Chunyu, YANG Zhe

Computer Engineering and Applications 2024, 60 (9): 142-150. DOI: 10.3778/j.issn.1002-8331.2312-0043

Abstract （450）

PDF（pc）（5858KB）（616）

Save

Facing with the problems of small and densely distributed image targets, uneven class distribution, and model size limitation of hardware conditions, object detection from the perspective of drones has less precise results. A new improved model based on YOLOv8s with multiple attention mechanisms is proposed. To solve the problem of shared attention weight parameters in receptive field features and enhance feature extraction ability, receptive field attention convolution and CBAM (concentration based attention module) attention mechanism are introduced into the backbone, adding attention weight in channel and spatial dimensions. By introducing large separable kernel attention into feature pyramid pooling layers, information fusion between different levels of features is increased. The feature layers with rich semantic information of small targets are added to improve the neck structure. The inner-IoU loss function is used to improve the MPDIoU (minimum point distance based IoU) function and the inner-MPDIoU instead of the original loss function is used to enhance the learning ability for difficult samples. The experimental results show that the improved YOLOv8s model has improved mAP, P, and R by 16.1%, 9.3%, and 14.9% respectively on the VisDrone dataset, surpassing YOLOv8m in performance and can be effectively applied to unmanned aerial vehicle visual detection tasks.

Reference | Related Articles | Metrics

Select

Research on Improving YOLOv7’s Small Target Detection Algorithm

LI Anda, WU Ruiming, LI Xudong

Computer Engineering and Applications 2024, 60 (1): 122-134. DOI: 10.3778/j.issn.1002-8331.2307-0004

Abstract （422）

PDF（pc）（884KB）（251）

Save

With the continuous application of deep learning in domestic object detection, conventional large and medium object detection has made astonishing progress. However, due to the limitations of convolutional networks themselves, there are still issues of missed and false detections in small object detection. Taking dataset Visdrone 2019 and dataset FloW-Img as examples, the YOLOv7 model is studied, and the ELAN module of the backbone network is improved in the network structure. The Focal NeXt block is integrated into the long and short gradient paths of the ELAN module to enhance the feature quality of small targets and improve the contextual information content contained in the output features. The RepLKDeXt module is introduced into the head network, which not only replaces the SPPCSPC module to simplify the overall structure of the model, but also optimizes the ELAN-H structure using multi-channel, large convolutional kernels, and Cat operations. Finally, the SIOU loss function is introduced to replace the CIOU function to improve the robustness of the model. The results show that the improved YOLOv7 model reduces the number of parameters and computational complexity, and its detection performance remains approximately unchanged on the Visdrone 2019 dataset with high small target density. It increases by 9.05 percentage points on the sparse FloW-Img dataset with small targets, further simplifying the model and increasing its applicability.

Reference | Related Articles | Metrics

Select

Review of Fault Diagnosis Techniques for UAV Flight Control Systems

AN Xue, LI Shaobo, ZHANG Yizong, ZHANG Ansi

Computer Engineering and Applications 2023, 59 (24): 1-15. DOI: 10.3778/j.issn.1002-8331.2305-0137

Abstract （417）

PDF（pc）（917KB）（1494）

Save

In recent years, unmanned aerial vehicles（UAVs） have been widely used in various complex fields of military and civilian applications due to their unique advantages such as low operating costs and high mobility. At the same time, the complex and diverse missions have put forward higher requirements for the reliability and safety of UAV systems. The UAV fault diagnosis technology can provide timely and accurate diagnosis results, which helps the maintenance, repair and servicing of UAVs, and is of great significance in enhancing the combat effectiveness of UAVs. Therefore, this paper firstly analyses UAV flight control systems, and classifies the faults. Secondly, the research methods and status quo of UAV fault diagnosis technology are analysed and summarised. Finally, the main challenges faced by UAV fault diagnosis technology are discussed and the future development direction is pointed out; the aim is to provide some reference for researchers in the field of UAV fault diagnosis technology and to promote the improvement of UAV fault diagnosis technology level in China.

Reference | Related Articles | Metrics

Select

Multi-Object Tracking Algorithm Based on CNN-Transformer Feature Fusion

ZHANG Yingjun, BAI Xiaohui, XIE Binhong

Computer Engineering and Applications 2024, 60 (2): 180-190. DOI: 10.3778/j.issn.1002-8331.2211-0028

Abstract （388）

PDF（pc）（787KB）（228）

Save

In convolutional neural network (CNN), convolution can efficiently extract local features of the object, but it is difficult to capture global representation; in the visual Transformer, the attention mechanism can capture long-distance feature dependency, but will ignore local feature details. To solve the above problems, a multi-object tracking algorithm CTMOT (CNN transformer multi-object tracking) based on CNN-Transformer hybrid network for feature extraction and fusion is proposed. Firstly, the backbone network is adopted based on CNN and Transformer to extract the local and global features of the image respectively. Secondly, two way bridge module (TBM) is used to fully integrate two features. Then, the fused features are input to two parallel decoders for processing. Finally, the detection box and the tracking box outputted by the decoder are matched to obtain final tracking result and complete the multi-target tracking task. Evaluated on MOT17, MOT20, KITTI and UA-DETRAC multi-object tracking datasets, the MOTA indicators of CTMOT algorithm have reached 76.4%, 66.3%, 92.36% and 88.57% respectively. It is equivalent to the SOTA method on the MOT dataset, and achieves the SOTA effect on the KITTI dataset. At the same time, the MOTP and IDs indicators have reached the SOTA effect on all datasets. In addition, since the object detection and correlation are completed at the same time, the object tracking can be carried out end-to-end, and the tracking speed can reach 35 FPS, which shows that CTMOT algorithm achieves a good balance in the real-time and accuracy of tracking, and has great potential.

Reference | Related Articles | Metrics

Select

Improved YOLOv8 Object Detection Algorithm for Traffic Sign Target

TIAN Peng, MAO Li

Computer Engineering and Applications 2024, 60 (8): 202-212. DOI: 10.3778/j.issn.1002-8331.2309-0415

Abstract （358）

PDF（pc）（937KB）（286）

Save

Although the current testing technology is becoming increasingly mature, the detection of small targets in complex environments is still the most difficult point in research. Aiming at the problem of high target proportion of traffic signs in road traffic scenarios, the problem of high target proportion of small targets and large environmental interference factors, it proposes a type of road traffic logo target test algorithm based on YOLOv8 improvement. Due to the prone to missed inspection in small target testing, the bi-level routing attention (BRA) attention mechanism is used to improve the network’s perception of small targets. In addition, it also uses a shape-changing convolutional module deformable convolution V3 (DCNV3). It has a better feature extraction ability for irregular shapes in the feature map, so that the backbone network can better adapt to irregular space structures, and pay more accurately to important attention，objectives, thereby improving the detection ability of the model to block the overlapping target. Both DCNV3 and BRA modules improve the accuracy of the model without increasing the weight of the model. At the same time, the Inner-IOU loss function based on auxiliary border is introduced. On the four data sets of RoadSign, CCTSDB, TSDD, and GTSDB, small sample training, large sample training, single target detection, and multi-target detection are performed. The experimental results are improved. Among them, the experiments on the RoadSign data set are the best. The average accuracy of the improved YOLOv8 model mAP50 and mAP50：95 reaches 90.7% and 75.1%, respectively. Compared with the baseline model, mAP50 and mAP50：95 have increased by 5.9 and 4.8 percentage points, respectively. The experimental results show that the improved YOLOV8 model effectively implements the traffic symbol detection in complex road scenarios.

Reference | Related Articles | Metrics

Select

Improved YOLOv8 Multi-Scale and Lightweight Vehicle Object Detection Algorithm

ZHANG Lifeng, TIAN Ying

Computer Engineering and Applications 2024, 60 (3): 129-137. DOI: 10.3778/j.issn.1002-8331.2309-0145

Abstract （334）

PDF（pc）（713KB）（317）

Save

To address issues such as high hardware requirements, low detection accuracy, and a high rate of missed overlapping targets in traditional vehicle object detection models, a modified vehicle object detection algorithm called RBT-YOLO based on YOLOv8 is proposed. The main network is reconstructed using a multi-scale fusion approach. BiFPN is improved by adding convolutional operations and adjusting input/output channel numbers to adapt to YOLOv8, enhancing its feature fusion capability. After the feature maps are output from the Neck section, a lightweight attention mechanism called Triplet Attention is introduced to enhance the feature extraction ability of the model. To address the issue of high target overlap in real scenarios, SoftNMS (soft non-maximum suppression) is used to replace the original NMS, making the model to handle the candidate boxes more gentle, thereby strengthening detection capabilities of the model and improving recall rates. Experimental results on the Pascal VOC and MS COCO datasets demonstrate that the proposed RBT-YOLO outperforms the original model, reducing parameters and computations by approximately 60%, the mAP improved by 2.6 and 3.0 percentage points, and excelling in both size and precision compared to other classic detection models, thus demonstrating strong practical utility.

Reference | Related Articles | Metrics

Select

Small Sample Steel Plate Defect Detection Algorithm of Lightweight YOLOv8

DOU Zhi, GAO Haoran, LIU Guoqi, CHANG Baofang

Computer Engineering and Applications 2024, 60 (9): 90-100. DOI: 10.3778/j.issn.1002-8331.2311-0070

Abstract （315）

PDF（pc）（5010KB）（370）

Save

The surface area of steel plate is large, and the surface defects are very common, and showing the characteristics of multi-class and small amount. Deep learning is difficult to be effectively applied to the detection of such small sample defects. In order to solve this problem, a small sample steel plate defect detection algorithm based on lightweight YOLOv8 is proposed. Firstly, an interactive data augmentation algorithm based on fuzzy search is proposed, which can effectively solve the problem that the network model cannot be effectively trained due to the lack of training samples, making it possible for deep learning to be applied in this field. Then, the LMRNet (lightweight multi-scale residual networks) network is designed to replace the backbone of YOLOv8, to achieve the lightweight of the network model and improve its portability. Finally, the CBFPN (context bidirectional feature pyramid network) and ECSA (efficient channel spatial attention) modules are proposed to make the network more effective in extracting and fusing scar features, and the Wise-IoU loss function is adopted to improve the detection performance. The comparative experimental results show that compared with the original YOLOv8 algorithm, the amount of parameters of the improved network is only 30% of the original network, the amount of calculation is 49% of the original network, the FPS is increased by 9 frame/s. The accuracy rate, recall rate and mAP have increased by 2.9, 6.5 and 5.5 percentage points respectively. Experimental results fully verify the advantages of the proposed algorithm.

Reference | Related Articles | Metrics

Select

Survey on Video-Text Cross-Modal Retrieval

CHEN Lei, XI Yimeng, LIU Libo

Computer Engineering and Applications 2024, 60 (4): 1-20. DOI: 10.3778/j.issn.1002-8331.2306-0382

Abstract （309）

PDF（pc）（3662KB）（284）

Save

Modalities define the specific forms in which data exist. The swift expansion of various modal data types has brought multimodal learning into the limelight. As a crucial subset of this field, cross-modal retrieval has achieved noteworthy advancements, particularly in integrating images and text. However, videos, as opposed to images, encapsulate a richer array of modal data and offer a more extensive spectrum of information. This richness aligns well with the growing user demand for comprehensive and adaptable information retrieval solutions. Consequently, video-text cross-modal retrieval has emerged as a burgeoning area of research in recent times. To thoroughly comprehend video-text cross-modal retrieval and its state-of-the-art developments, a methodical review and summarization of the existing representative methods is conducted. Initially, the focus is on analyzing current deep learning-based unidirectional and bidirectional video-text cross-modal retrieval methods. This analysis includes an in-depth exploration of seminal works within each category, highlighting their strengths and weaknesses. Subsequently, the discussion shifts to an experimental viewpoint, introducing benchmark datasets and evaluation metrics specific to video-text cross-modal retrieval. The performance of several standard methods in benchmark datasets is compared. Finally, the application prospects and future research challenges of video- text cross-modal retrieval are discussed.

Reference | Related Articles | Metrics

Select

Review of Deep Learning Methods Applied to Medical CT Super-Resolution

TIAN Miaomiao, ZHI Lijia, ZHANG Shaomin, CHAO Daifu

Computer Engineering and Applications 2024, 60 (3): 44-60. DOI: 10.3778/j.issn.1002-8331.2303-0224

Abstract （300）

PDF（pc）（867KB）（242）

Save

Image super resolution (SR) is one of the important processing methods to improve image resolution in the field of computer vision, which has important research significance and application value in the field of medical image. High quality and high-resolution medical CT images are very important in the current clinical process. In recent years, the technology of medical CT image super-resolution reconstruction based on deep learning has made remarkable progress. This paper reviews the representative methods in this field and systematically reviews the development of medical CT image super-resolution reconstruction technology. Firstly, the basic theory of SR is introduced, and the commonly used evaluation indexes are given. Then, it focuses on the innovation and progress of super-resolution reconstruction of medical CT images based on deep learning, and makes a comprehensive comparative analysis of the main characteristics and performance of each method. Finally, the difficulties and challenges in the direction of medical CT image super-resolution reconstruction are discussed, and the future development trend is summarized and prospected, hoping to provide reference for related research.

Reference | Related Articles | Metrics

Select

Algorithm for Real-Time Vehicle Detection from UAVs Based on Optimizing and Improving YOLOv8

SHI Tao, CUI Jie, LI Song

Computer Engineering and Applications 2024, 60 (9): 79-89. DOI: 10.3778/j.issn.1002-8331.2312-0291

Abstract （300）

PDF（pc）（4614KB）（382）

Save

To address the problems of low accuracy, easy interference from background environment and difficulty in detecting small target vehicles of existing UAV vehicle detection algorithms, an improved UAV vehicle detection algorithm YOLOv8-CX is proposed based on YOLOv8. By integrating the advantages of Deformable Convolutional Networks v1-3, a C2f-DCN module is proposed to flexibly sample features and better extract features between vehicles of different sizes. Utilizing the idea of large separable kernel attention, a SPPF-LSKA module is proposed with long-range dependency and self-adaptability, which can effectively reduce background interference on vehicle detection. In the neck network, a CF-FPN (ment network for tiny object deteciton) feature fusion structure is adopted to enhance the detection accuracy of small targets by combining contextual information and suppressing conflicts between features at different scales. Finally, the original YOLOv8 head is replaced with a Dynamic Head detection head. By unifying scale, space and task, the three types of attention mechanisms, the model detection performance is further improved. Experimental results show that on the Mapsai dataset, compared with the original algorithm, the improved algorithm increases the accuracy (P), recall (R) and mean average precision (mAP) by 8.5, 11.2 and 6.2 percentage points respectively, and the algorithm detection speed reaches 72.6 FPS, meeting the real-time requirements of UAV vehicle detection. By comparing with other mainstream target detection algorithms, the effectiveness and superiority of this method are validated.

Reference | Related Articles | Metrics

Select

Survey of Agricultural Knowledge Graph

TANG Wentao, HU Zelin

Computer Engineering and Applications 2024, 60 (2): 63-76. DOI: 10.3778/j.issn.1002-8331.2305-0203

Abstract （297）

PDF（pc）（629KB）（229）

Save

Knowledge graphs are a key technology in the era of big data, specifically for knowledge engineering. Utilizing the powerful semantic understanding and knowledge organization capabilities of knowledge graphs, issues such as scattered and disordered agricultural knowledge, and insufficient coverage of knowledge in the construction of modern agriculture can be resolved. Firstly, considering the complexity and specialty of agricultural data, the construction methods and framework of agricultural knowledge graphs are introduced. Secondly, the current domestic and international research status of the four key technologies in the construction of agricultural knowledge graphs-ontology construction, knowledge extraction, knowledge fusion, and knowledge reasoning are reviewed. Furthermore, the systematic applications of agricultural knowledge graphs in decision support, intelligent question-answering systems, and recommendation systems are sorted out. Lastly, several specific instances of agricultural knowledge graphs are presented. Based on the current status of research on agricultural knowledge graphs, prospects for its future research directions are offered.

Reference | Related Articles | Metrics

Select

Improved YOLOv8 Small Target Detection Algorithm in Aerial Images

FU Jinyi, ZHANG Zijia, SUN Wei, ZOU Kaixin

Computer Engineering and Applications 2024, 60 (6): 100-109. DOI: 10.3778/j.issn.1002-8331.2311-0281

Abstract （290）

PDF（pc）（771KB）（255）

Save

In aerial image detection task, object and the overall image size are small, scales have different characteristics and detail information is not clear, it can cause leak and mistakenly identified problems, an improved small target detection algorithm CA-YOLOv8 is proposed. Channel feature partial convolution (CFPConv) is designed. Based on this, it reconstructs a Bottleneck structure in C2f, which is named CFP_C2f. In this way, some C2f modules in YOLOv8 head and neck are replaced, the effective channel feature weights are enhanced, and the ability to obtain multi-scale detail features is improved. A context aggregated module (CAM) is embedded to improve the context aggregation ability, optimize the response of feature channels, and strengthen the ability to perceive the details of deep features. The NWD loss function is added and combined with CIoU as a positioning regression loss function to reduce the sensitivity of position bias. By making full use of the advantages of multiple attention mechanism, the original detection head is replaced with DyHead (dynamic head). In the experiment of VisDrone2019 dataset, the improved algorithm reduces the number of parameters by 33.3% compared with the original YOLOv8s model, and the detection accuracy of mAP50 and mAP50：95 increases by 8.7 and 5.7 percentage points respectively, showing good performance and confirming its effectiveness.

Reference | Related Articles | Metrics

Select

Review on Human Action Recognition Methods Based on Multimodal Data

WANG Cailing, YAN Jingjing, ZHANG Zhidong

Computer Engineering and Applications 2024, 60 (9): 1-18. DOI: 10.3778/j.issn.1002-8331.2310-0090

Abstract （275）

PDF（pc）（8541KB）（399）

Save

Human action recognition (HAR) is widely applied in the fields of intelligent security, autonomous driving and human-computer interaction. With advances in capture equipment and sensor technology, the data that can be acquired for HAR is no longer limited to RGB data, but also multimodal data such as depth, skeleton, and infrared data. Feature extraction methods in HAR based on RGB and skeleton data modalities are introduced in detail, including handcrafted-based and deep learning-based methods. For RGB data modalities, feature extraction algorithms based on two-stream convolutional neural network (2s-CNN), 3D convolutional neural network (3DCNN) and hybrid network are analyzed. For skeleton data modalities, some popular pose estimation algorithms for single and multi-person are firstly introduced. The classification algorithms based on convolutional neural network (CNN), recurrent neural network (RNN), and graph convolutional neural network (GCN) are analyzed stressfully. A further comprehensive demonstration of the common datasets for both data modalities is presented. In addition, the current challenges are explored based on the corresponding data structure features of RGB and skeleton. Finally, future research directions for deep learning-based HAR methods are discussed.

Reference | Related Articles | Metrics

Select

Review of Development of Deep Learning Optimizer

CHANG Xilong, LIANG Kun, LI Wentao

Computer Engineering and Applications 2024, 60 (7): 1-12. DOI: 10.3778/j.issn.1002-8331.2307-0370

Abstract （263）

PDF（pc）（1327KB）（293）

Save

Optimization algorithms are the most critical factor in improving the performance of deep learning models, achieved by minimizing the loss function. Large language models (LLMs), such as GPT, have become the research focus in the field of natural language processing, the optimization effect of traditional gradient descent algorithm has been limited. Therefore, adaptive moment estimation algorithms have emerged, which are significantly superior to traditional optimization algorithms in generalization ability. Based on gradient descent, adaptive gradient, and adaptive moment estimation algorithms, and the pros and cons of optimization algorithms are analyzed. This paper applies optimization algorithms to the Transformer architecture and selects the French-English translation task as the evaluation benchmark. Experiments have shown that adaptive moment estimation algorithms can effectively improve the performance of the model in machine translation tasks. Meanwhile, it discusses the development direction and applications of optimization algorithms.

Reference | Related Articles | Metrics

Select

Survey of Few-Shot Image Classification Based on Deep Meta-Learning

ZHOU Bojun, CHEN Zhiyu

Computer Engineering and Applications 2024, 60 (8): 1-15. DOI: 10.3778/j.issn.1002-8331.2308-0271

Abstract （256）

PDF（pc）（1091KB）（311）

Save

Deep meta-learning has emerged as a popular paradigm for addressing few-shot classification problems. A comprehensive review of recent advancements in few-shot image classification algorithms based on deep meta-learning is provided. Starting from the problem description, the categorizes of the algorithms based on deep meta-learning for few-shot image classification are summarized, and commonly used few-shot image classification datasets and evaluation criteria are introduced. Subsequently, typical models and the latest research progress are elaborated in three aspects: model-based deep meta-learning methods, optimization-based deep meta-learning methods, and metric-based deep meta-learning methods. Finally, the performance analysis of existing algorithms on popular public datasets is presented, the research hotspots in this topic are summarized, and its future research directions are discussed.

Reference | Related Articles | Metrics

Select

Vehicle Detection Algorithm Based on Improved YOLOv8 in Traffic Surveillance

ZHOU Fei, GUO Dudu, WANG Yang, WANG Qingqing, QIN Yin, YANG Zhuomin, HE Haijun

Computer Engineering and Applications 2024, 60 (6): 110-120. DOI: 10.3778/j.issn.1002-8331.2310-0101

Abstract （251）

PDF（pc）（817KB）（259）

Save

To address the current problems of insufficient vehicle detection accuracy and slow detection speed in complex traffic monitoring scenarios, a lightweight vehicle detection algorithm based on YOLOv8 model is proposed. Firstly, FasterNet is used to replace the backbone feature extraction network of YOLOv8, which reduces redundant computation and memory access, and improves the detection accuracy and inference speed of the model.Secondly, the SimAM attention module is added to the Backbone and Neck sections, which enhances the important features of the target vehicles without increasing the original network parameters, and improves the feature fusion capability. Then, to address the problem of poor detection of small-sized vehicles under dense traffic flow, a small target detection head is added to better capture the features and contextual information of small-sized vehicles. Finally, Wise-IoU, which can adaptively adjust the weight coefficients, is used as the loss function of the improved model, which enhances the regression performance of the bounding box and the robustness of the detection.The experimental results on the UA-DETRAC dataset show that compared with the original model, the improved method in this paper is able to achieve better detection accuracy and speed in the traffic monitoring system, with the mAP and FPS improved by 3.06 percengtage points and 3.36%, respectively, which effectively improves the problem of the poor detection of small-target vehicles in the complex traffic scenarios, and achieves a good balance between accuracy and speed.

Reference | Related Articles | Metrics

Select

Research on Gesture Recognition Based on Improved YOLOv5 and Mediapipe

NI Guangxing, XU Hua, WANG Chao

Computer Engineering and Applications 2024, 60 (7): 108-118. DOI: 10.3778/j.issn.1002-8331.2308-0097

Abstract （250）

PDF（pc）（686KB）（199）

Save

The existing gesture recognition algorithms have the problems of large amounts of calculation and poor robustness. In this paper, a gesture recognition method based on IYOLOv5-Med (improved YOLOv5 Mediapipe) algorithm is proposed. This algorithm combines the improved YOLOv5 algorithm with the Mediapipe method, including gesture detection and gesture analysis. In the part of gesture detection, the traditional YOLOv5 algorithm is improved. Firstly, the C3 module is reconstructed by FastNet. Secondly, the CBS module is replaced by the GhostConv module in GhostNet. Thirdly, the SE attention mechanism module is introduced at the end of the Backbone network. The improved algorithm has a smaller model size and is more suitable for edge devices with limited resources. In the part of gesture analysis, a method based on Mediapipe is proposed. The key points of the hand are detected in the gesture area located in the gesture detection part, and the relevant features are extracted, and then identified by the naive Bayes classifier. The experimental findings affirm the efficacy of the IYOLOv5-Med algorithm introduced in this article. When compared to the conventional YOLOv5 algorithm, the parameters are reduced by 34.5%, the computations are reduced by 34.9%, and the model weight is decreased by 33.2%. The final average recognition rate reaches 0.997, and the implementation method is relatively simple, which has a good application prospect.

Reference | Related Articles | Metrics

Select

Survey of Chinese Named Entity Recognition Research

ZHAO Jigui, QIAN Yurong, WANG Kui, HOU Shuxiang, CHEN Jiaying

Computer Engineering and Applications 2024, 60 (1): 15-27. DOI: 10.3778/j.issn.1002-8331.2304-0398

Abstract （249）

PDF（pc）（606KB）（166）

Save

Named entity recognition (NER) is one of the most fundamental tasks in natural language processing, and its main content is to identify the entity types and boundaries with specific meanings in natural language text. However, the data samples of Chinese named entity recognition (CNER) have problems such as blurred word boundaries, semantic diversity, blurred morphological features and small Chinese corpus content, which make it difficult to improve the performance of Chinese NER. In this paper, firstly, the dataset, annotation scheme and evaluation index of CNER are introduced. Secondly, according to the research process of CNER, CNER methods are classified into three categories: rule-based methods, statistical-based methods and deep learning-based methods, and the main models of CNER based on deep learning in the past five years are summarized. Finally, the research trends of CNER are discussed to provide some reference for the proposal of new methods and future research directions.

Reference | Related Articles | Metrics

Select

Survey on Attack Methods and Defense Mechanisms in Federated Learning

ZHANG Shiwen, CHEN Shuang, LIANG Wei, LI Renfa

Computer Engineering and Applications 2024, 60 (5): 1-16. DOI: 10.3778/j.issn.1002-8331.2306-0243

Abstract （247）

PDF（pc）（792KB）（267）

Save

The attack and defense techniques of federated learning are the core issue of federated learning system security. The attack and defense techniques of federated learning can significantly reduce the risk of being attacked and greatly enhance the security of federated learning systems. Deeply understanding the attack and defense techniques of federated learning can advance research in the field and achieve its widespread application of federated learning. Therefore, it is of great significance to study the attack and defense techniques of federated learning. Firstly, this paper briefly introduces the concept, basic workflow, types, and potential existing security issues of federated learning. Subsequently, the paper introduces the attacks that the federated learning system may encounter, and relevant research is summarized during the introduction. Then, starting from whether the federated learning system has targeted defense measures, the defense measures are divided into two categories：universal defense measures and targeted defense measures, and targeted summary are made. Finally, it reviews and analyzes the future research directions for the security of federated learning, providing reference for relevant researchers in their research work on the security of federated learning.

Reference | Related Articles | Metrics

Select

Small Object Detection Algorithm Based on ATO-YOLO

SU Jia, QIN Yichang, JIA Ze, WANG Jing

Computer Engineering and Applications 2024, 60 (6): 68-77. DOI: 10.3778/j.issn.1002-8331.2308-0385

Abstract （245）

PDF（pc）（795KB）（227）

Save

Small object detection is of great significance in the field of computer vision. However, existing methods often suffer from issues such as missed detection and false alarms when dealing with challenges like scale variation, dense object arrangement, and irregular layouts. To address these problems, ATO-YOLO, an improved version of the YOLOv5 algorithm is proposed. Firstly, this paper introduces an adaptive feature extraction (AFE) module that incorporates an attention mechanism to enhance the feature representation capability of the detection model. By dynamically adjusting the weight allocation to highlight key object features, AFE improves the accuracy and robustness of object detection tasks in various scenarios. Secondly, a triple feature fusion (TFF) mechanism is designed to effectively utilize multi-scale information by fusing feature maps from different scales, resulting in more comprehensive object features and enhanced detection performance for small objects. Lastly, an output reconstruction (ORS) module is introduced, which removes the large object detection layer and adds a small object detection layer, enabling precise localization and recognition of small objects. This module also reduces model complexity and improves detection speed compared to the original model. Experimental results demonstrate that the ATO-YOLO algorithm achieves an mAP@0.5 of 38.2% on the VisDrone dataset, a 6.1?percentage points improvement over YOLOv5, with a relative FPS increase of 4.4%. This algorithm enables fast and accurate detection of small objects.

Reference | Related Articles | Metrics

Select

Lightweight Foggy Weather Object Detection Method Based on YOLOv5

LAI Jing’an, CHEN Ziqiang, SUN Zongwei, PEI Qingqi

Computer Engineering and Applications 2024, 60 (6): 78-88. DOI: 10.3778/j.issn.1002-8331.2308-0029

Abstract （239）

PDF（pc）（1220KB）（232）

Save

Aiming at the low accuracy and high model complexity of object detection algorithms in foggy scenes, a lightweight foggy object detection method based on YOLOv5 is proposed. Firstly, this paper adopts the receptive field attention module (RFAblock) to add an attention mechanism to the receptive field by interacting with the receptive field feature information to improve the feature extraction ability. Secondly, the lightweight network Slimneck is used as the neck structure to reduce the model parameters and complexity while maintaining the accuracy. The angle vector between the real frame and the predicted frame is introduced in the loss function to improve the training speed and inference accuracy. PNMS (precise non-maximum suppression) is used to improve the candidate frame selection mechanism and reduce the leakage detection rate in the case of vehicle occlusion. Finally, the experimental results are tested on the real foggy day dataset RTTS and the synthetic foggy day dataset Foggy Cityscapes, and the experimental results show that the mAP50 is improved by 4.9 and 3.5 percengtage points, respectively, compared with YOLOv5l, and the number of model parameters is only 54.6% of that of YOLOv5l.

Reference | Related Articles | Metrics

Select

Baggage Tracking Technology Based on Improved YOLO v8

CAO Chao, GU Xingsheng

Computer Engineering and Applications 2024, 60 (9): 151-158. DOI: 10.3778/j.issn.1002-8331.2310-0238

Abstract （238）

PDF（pc）（6479KB）（346）

Save

In the airport baggage sorting scenario, the traditional multi-target tracking algorithm has the problems of high target ID switching rate and high false alarm rate of target trajectory. This paper presents a baggage tracking technique based on improved YOLO v8 and ByteTrack algorithms. The CBATM module is added, the ADH decoupling head is replaced and the loss function during training is changed, the detection accuracy is increased, the discrimination of target features is strengthened, and the ID switching rate of the target is reduced. GSI interpolation processing in Byte data association, which not only uses high box and low box, but also ensures the tracking effect after a long time of occlusion, and reduces the ID error switching caused by occlusion. In the airport baggage sorting dataset, MOTA and IDF 1 reach 89.9% and 90.3%, respectively, which show a significant improvement and can steadily realize the tracking of luggage ID.

Reference | Related Articles | Metrics

Select

Review of Unsupervised Domain Adaptation in Medical Image Segmentation

HU Wei, XU Qiaozhi, GE Xiangwei, YU Lei

Computer Engineering and Applications 2024, 60 (6): 10-26. DOI: 10.3778/j.issn.1002-8331.2307-0421

Abstract （237）

PDF（pc）（756KB）（215）

Save

Medical image segmentation has broad application prospects in the field of medical image processing, providing auxiliary information for diagnosis and treatment by locating and segmenting interested organs, tissues, or lesion areas. However, there is a domain offset problem between different modalities of medical images, which can lead to a significant decrease in the performance of the segmentation model during actual deployment. Domain adaptation technology is an effective way to solve this problem, especially unsupervised domain adaptation, which has become a research hotspot in the field of medical image processing because it does not require target domain label information. At present, there are relatively few review reports on unsupervised domain adaptation research in medical image segmentation. Therefore, this paper summarizes, analyzes, and prospects the future of unsupervised domain adaptation research in medical image segmentation in recent years, hoping to help relevant researchers quickly understand and familiarize themselves with the current research status and trends in this field.

Reference | Related Articles | Metrics

Select

Survey of Vision Transformer in Low-Level Computer Vision

ZHU Kai, LI Li, ZHANG Tong, JIANG Sheng, BIE Yiming

Computer Engineering and Applications 2024, 60 (4): 39-56. DOI: 10.3778/j.issn.1002-8331.2304-0139

Abstract （235）

PDF（pc）（3488KB）（175）

Save

Transformer is a revolutionary neural network architecture initially designed for natural language processing. However, its outstanding performance and versatility have led to widespread applications in the field of computer vision. While there is a wealth of research and literature on Transformer applications in natural language processing, there remains a relative scarcity of specialized reviews focusing on low-level visual tasks. In light of this, this paper begins by providing a brief introduction to the principles of Transformer and analyzing several variants. Subsequently, the focus shifts to the application of Transformer in low-level visual tasks, specifically in the key areas of image restoration, image enhancement, and image generation. Through a detailed analysis of the performance of different models in these tasks, this paper explores the variations in their effectiveness on commonly used datasets. This includes achievements in restoring damaged images, improving image quality, and generating realistic images. Finally, this paper summarizes and forecasts the development trends of Transformer in the field of low-level visual tasks. It suggests directions for future research to further drive innovation and advancement in Transformer applications. The rapid progress in this field promises breakthroughs for computer vision and image processing, providing more powerful and efficient solutions for practical applications.

Reference | Related Articles | Metrics

Select

Algorithmic Research Overview on Graph Coloring Problems

SONG Jiahuan, WANG Xiaofeng, HU Simin, JIA Jingwei, YAN Dong

Computer Engineering and Applications 2024, 60 (18): 66-77. DOI: 10.3778/j.issn.1002-8331.2403-0434

Abstract （235）

PDF（pc）（4612KB）（186）

Save

The graph coloring problem (GCP) is a classic combinatorial optimization problem that has been widely applied in various fields such as mathematics, computer science, and biological science. Due to the NP hard nature of graph coloring problems, there is currently no precise algorithm in polynomial time to solve the problem. In order to provide an efficient algorithm for solving this problem, it is necessary to review the existing algorithms. It mainly divided into intelligent optimization algorithms, heuristic algorithms, reinforcement learning algorithms, etc., comparative analysis is carried out from the aspects of algorithm principles, improvement ideas, performance and accuracy, summarizing the advantages and disadvantages of algorithms, and pointing out the research direction and algorithm design path of GCP, which has guiding significance for the research of related problems.

Reference | Related Articles | Metrics

Select

Review of Research on Artificial Intelligence in Traditional Chinese Medicine Diagnosis and Treatment

SU Youli, HU Xuanyu, MA Shijie, ZHANG Yuning, Abudukelimu Abulizi, Halidanmu Abudukelimu

Computer Engineering and Applications 2024, 60 (16): 1-18. DOI: 10.3778/j.issn.1002-8331.2312-0400

Abstract （232）

PDF（pc）（6171KB）（262）

Save

The field of traditional Chinese medicine (TCM) diagnosis and treatment is gradually moving towards standardization, objectification, modernization, and intelligence. In this process, the integration of artificial intelligence (AI) has greatly propelled the advancement of TCM diagnosis and treatment, scientific research, and TCM inheritance. The review starts from the current research status of AI in TCM, combs through the application and development of AI in TCM in three stages from expert system and rule engines, traditional machine learning algorithm to deep learning, and then summarizes the knowledge management tools and large language models of TCM in recent years. Finally, this paper analyzes the multiple challenges of data fairness, multimodal data understanding, model robustness, personalized medicine, and interpretability that exist at this stage of AI in TCM. To address these challenges, it is necessary to continuously explore and propose possible solutions to promote the in-depth development of intelligent TCM diagnosis and treatment, thus better meeting the health needs of people.

Reference | Related Articles | Metrics

Select

Review of Deep Learning Methods for Palm Vein Recognition

TAN Zhenlin, LIU Ziliang, HUANG Aiquan, CHEN Huihui, ZHONG Yong

Computer Engineering and Applications 2024, 60 (6): 55-67. DOI: 10.3778/j.issn.1002-8331.2306-0168

Abstract （223）

PDF（pc）（664KB）（129）

Save

Palm vein recognition, as a new infrared biometrics technology, has become one of the research hotspots in the field of biometric recognition because of its advantages of high security and liveness detection. In recent years, a great deal of research in this field has promoted the development of palm vein recognition technology by introducing deep learning methods. In order to grasp the latest research status and development direction in the field of palm vein recognition, data acquisition and the mainstream algorithms of data pre-processing are classified and summarized, and the latest progress of palm vein recognition based on deep learning is classified and elaborated in terms of palm vein feature representation, network design and optimization, and lightweight network. In view of the bottleneck of single-modal recognition, the correlation algorithms of multimodal and multi-feature fusion recognition are analyzed and compared. The difficulties and challenges of current research on palm vein recognition are discussed, and the future development trends are prospected and summarized.

Reference | Related Articles | Metrics

Select

Review of Application of Visual Foundation Model SAM in Medical Image Segmentation

SUN Xing, CAI Xiaohong, LI Ming, ZHANG Shuai, MA Jingang

Computer Engineering and Applications 2024, 60 (17): 1-16. DOI: 10.3778/j.issn.1002-8331.2401-0136

Abstract （221）

PDF（pc）（7912KB）（218）

Save

With the continuous development of foundation models technology, visual foundation model represented by the segment anything model (SAM) has made significant breakthroughs in the field of image segmentation. SAM, driven by prompts, accomplishes a series of downstream segmentation tasks, aiming to address all image segmentation issues comprehensively. Therefore, the application of SAM in medical image segmentation is of great significance, as its generalization performance can adapt to various medical images, providing healthcare professionals with a more comprehensive understanding of anatomical structures and pathological information. This paper introduces commonly used datasets for image segmentation, provides detailed explanations of SAM’s network architecture and generalization capabilities. It focuses on a thorough analysis of SAM’s application in five major categories of medical images: whole-slide imaging, magnetic resonance imaging, computed tomography, ultrasound, and multimodal images. The review summarizes the strengths and weaknesses of SAM, along with corresponding improvement methods. Combining current challenges in the field of medical image segmentation, the paper discusses and anticipates future directions for SAM’s development.

Reference | Related Articles | Metrics

Select

Research Progress on Recommendation Algorithms with Knowledge Graph Visualization Analysis

LIN Suqing, LUO Dingnan, ZHANG Shuhua

Computer Engineering and Applications 2024, 60 (21): 1-17. DOI: 10.3778/j.issn.1002-8331.2312-0032

Abstract （221）

PDF（pc）（1215KB）（286）

Save

The application and proliferation of internet technology has caused an exponential growth in data, enhancing the complexity of information retrieval from massive datasets. Recommendation algorithms have attracted significant attention for alleviating information overload, with relevant research findings continually emerging. 4?773 Chinese and 4?531 English publications from 2012 to 2024 have been sourced from China National Knowledge Infrastructure (CNKI) and the Web of Science (WOS) core collection. Visualization tools CiteSpace and VOSviewer have been utilized to generate basic information and keyword co-occurrence graphs for literatures. Core technology keywords, including knowledge graph, graph neural network, and deep learning, have been extracted through graph analysis, and the corresponding representative recommendation algorithms have been selected. The core mechanisms and the underlying principles of the algorithms have been visually presented through charts, focusing on the limitations and challenges of existing research, as well as targeted solutions. Knowledge architecture diagrams have been developed for the algorithms associated with each core technology keyword, following the challenge-solution-source literature framework. The visualization of recommendation principles has been effectively implemented.

Reference | Related Articles | Metrics

Select

Improved YOLOv8 Lightweight UAV Target Detection Algorithm

HU Junfeng, LI Baicong, ZHU Hao, HUANG Xiaowen

Computer Engineering and Applications 2024, 60 (8): 182-191. DOI: 10.3778/j.issn.1002-8331.2310-0063

Abstract （219）

PDF（pc）（813KB）（301）

Save

Aiming at the problem that UAV target detection algorithms are computationally complex and difficult to deploy, and the long-tailed distribution of UAV data leads to low detection accuracy, a lightweight UAV target detection algorithm based on improved YOLOv8 (PC-YOLOv8-n) is proposed, which can balance the network detection accuracy and computation, and has some generalisation ability to long-tailed distribution of data. Using partial convolutional layers (PConv) to replace the 3×3 convolutional layers in YOLOv8, the network is lightweighted to solve the problems of network redundancy and computational complexity; it fuses dual-channel feature pyramids, increases top-down paths, fusion of deep and shallow information, and introduces a lightweight attention mechanism in the same layer to improve the feature extraction ability of the network; it uses the equilibrium focus loss (EFL) as the category loss function to increase the category detection ability of the network by equalising the gradient weights of the tail categories during network training. The experimental results show that PC-YOLOv8-n has good performance in the VisDrone2019 dataset, improving 1.6 percentage points in mAP50 accuracy over the original YOLOv8-n algorithm, while the parameters and computation of the model are reduced to 2.6×106 and 7.6 GFLOPs, respectively, and the detection speed reaches 77.2 FPS.

Reference | Related Articles | Metrics

Select

Review of Deep Learning Approaches for Recognizing Multiple Unsafe Behaviors in Workers

SU Chenyang, WU Wenhong, NIU Hengmao, SHI Bao, HAO Xu, WANG Jiamin, GAO Le, WANG Weitai

Computer Engineering and Applications 2024, 60 (5): 30-46. DOI: 10.3778/j.issn.1002-8331.2307-0168

Abstract （218）

PDF（pc）（808KB）（208）

Save

With the development of deep learning, target detection and behavior recognition methods have made great progress in the field of worker unsafe behavior recognition, this paper systematically summarizes the relevant research work at home and abroad in recent years, elaborates the commonly used models and effects of target detection methods and behavior recognition methods, focuses on reviewing the application of the two types of methods in the recognition of unsafe behaviors and the relevant research on the combination of the two types of methods, and provides a comprehensive analysis and comparison on the advantages, limitations, recognized behavior categories and applicable scenarios of various methods are comprehensively analyzed and compared. On this basis, the optimization measures for target detection and behavior recognition in recent years are summarized, the commonly used optimization directions and means are summarized, the improvement methods successfully applied in unsafe behavior recognition are summarized, the difficulties and problems in this research field are sorted out, and the suggestions and future development trends are given, which will provide references and lessons for the research in this field.

Reference | Related Articles | Metrics

Select

Review of Research on Multimodal Retrieval

JIN Tao, JIN Ran, HOU Tengda, YUAN Jie, GU Xiaozhe

Computer Engineering and Applications 2024, 60 (5): 62-75. DOI: 10.3778/j.issn.1002-8331.2305-0294

Abstract （197）

PDF（pc）（657KB）（164）

Save

With the increasing of multimodal data, multimodal retrieval technology has received a lot of attention. With the introduction of computer and big data technology in automobile, medical and other industries, a large amount of industry data itself are presented in a multi-modal form. With the rapid development of the industry, people’s demand for information is constantly increasing, and single modal data retrieval can no longer meet people’s demand for information. In order to solve these problems and meet the needs of data retrieval from one mode and other modes, this paper studies multi-modal retrieval methods through literature review, analyzes different research methods such as common subspace, deep learning and multi-modal Hash algorithm, and sorts out the multi-modal retrieval techniques proposed by researchers in recent years to solve these problems. Finally, the multimodal retrieval methods proposed in recent years are evaluated and compared according to the accuracy, efficiency and characteristics of the retrieval. This paper analyzes the challenges encountered in multimodal retrieval and looks forward to the future application prospects of multimodal retrieval.

Reference | Related Articles | Metrics

Select

Survey on Distributed Assembly Permutation Flowshop Scheduling Problem

ZHANG Jing, SONG Hongbo, LIN Jian

Computer Engineering and Applications 2024, 60 (6): 1-9. DOI: 10.3778/j.issn.1002-8331.2307-0276

Abstract （194）

PDF（pc）（619KB）（173）

Save

As the rapid development of modern manufacturing, the past decades have witnessed a trend in which jobs are firstly processed in distributed production factories and then assembled into the final products in an assembly factory after completion. Such manufacturing mode brings many advantages as well as some new challenges on resource scheduling. This paper surveys literature on the distributed assembly permutation flowshop scheduling problem (DAPFSP). Firstly, the background and main issues in DAPFSP are introduced. Then, mathematical models, encoding and decoding schemes, and global and local search algorithms are thoroughly discussed for DAPFSP with the objective of minimizing the maximal completion time. Additionally, recent advances on DAPFSP with various objectives, such as total flow time, DAPFSP with other constraints like no-wait, and DAPFSP by taking issues including setup time into consideration are also surveyed. Finally, several future research directions worthy further investigation are pointed out.

Reference | Related Articles | Metrics

Select

Research Progress of Image Style Transfer Based on Neural Network

LIAN Lu, TIAN Qichuan, TAN Run, ZHANG Xiaohang

Computer Engineering and Applications 2024, 60 (9): 30-47. DOI: 10.3778/j.issn.1002-8331.2309-0204

Abstract （194）

PDF（pc）（7029KB）（231）

Save

Image style transfer is the process of remapping the content of a specified image with a style image, which is a research hotspot in the field of artificial intelligence computer vision. Traditional image style transfer methods are mainly based on the synthesis of physical and texture techniques, and the style transfer effect is rough and less robust. With the emergence of image datasets and the proposal of various deep learning model networks, many models and algorithms for image style transfer have emerged. This paper analyzes the current status of image style transfer research, combs the development of image style transfer and the latest research progress, and gives the future research directions of image style transfer through comparative analysis.

Reference | Related Articles | Metrics

Select

Review of Text Classification Methods Based on Graph Neural Networks

SU Yilei, LI Weijun, LIU Xueyang, DING Jianping, LIU Shixia, LI Haonan, LI Guanfeng

Computer Engineering and Applications 2024, 60 (19): 1-17. DOI: 10.3778/j.issn.1002-8331.2403-0142

Abstract （191）

PDF（pc）（3425KB）（223）

Save

Text classification is an important task in the field of natural language processing, aiming to assign given text data to a predefined set of categories. Traditional text classification methods can only handle data in Euclidean space and cannot process non-Euclidean data such as graphs. For text data with graph structure, it is not directly processable and cannot capture the non-Euclidean structure in the graph. Therefore, how to apply graph neural networks to text classification tasks is one of the current research hotspots. This paper reviews the text classification methods based on graph neural networks. Firstly, it outlines the traditional text classification methods based on machine learning and deep learning, and summarizes the background and principles of graph convolutional neural networks. Secondly, it elaborates on the text classification methods based on graph neural networks according to different types of graph networks, and conducts an in-depth analysis of the application of graph neural network models in text classification. Then, it compares the current text classification models based on graph neural networks through comparative experiments and discusses the classification performance of the models. Finally, it proposes future research directions to further promote the development of this field.

Reference | Related Articles | Metrics

Select

DY-YOLOv5：Target Detection for Aerial Image Based on Multiple Attention

ZHAO Xin, CHEN Lili, YANG Weichuan, ZHANG Chengwang

Computer Engineering and Applications 2024, 60 (7): 183-191. DOI: 10.3778/j.issn.1002-8331.2309-0419

Abstract （190）

PDF（pc）（1074KB）（172）

Save

Aiming at the problem of low detection accuracy caused by small targets, different scales and complex backgrounds in UAV aerial images, a target detection algorithm for UAV aerial images based on improved YOLOv5 is proposed. The algorithm introduces a target detection head method Dynamic Head with multiple attention mechanisms to replace the original detection head and improves the detection performance of the detection head in complex backgrounds. An upsampling and Concat operation is added to the neck part of the original model, and a multi-scale feature detection including minimal, small and medium targets is performed to improve the feature extraction ability of the model for medium and small targets. DenseNet is introduced and integrated with the C3 module of YOLOv5s backbone network to propose the C3_DenseNet module to enhance feature transfer and prevent model overfitting. The DY-YOLOv5 algorithm is applied to the VisDrone 2019 dataset, and the mean average precision (mAP) reaches 43.9%, which is 11.4 percentage points higher than the original algorithm. The recall rate (Recall) is 41.7%, which is 9.0 percentage points higher than the original algorithm. Experimental results show that the improved algorithm significantly improves the accuracy of target detection in UAV aerial images.

Reference | Related Articles | Metrics

Most Read articles