Most Read articles

    Published in last 1 year |  In last 2 years |  In last 3 years |  All

    Published in last 1 year
    Please wait a minute...
    For Selected: Toggle Thumbnails
    Review of SLAM Based on Lidar
    LIU Mingzhe, XU Guanghui, TANG Tang, QIAN Xiaojian, GENG Ming
    Computer Engineering and Applications    2024, 60 (1): 1-14.   DOI: 10.3778/j.issn.1002-8331.2308-0455
    Abstract867)      PDF(pc) (854KB)(574)       Save
    Simultaneous localization and mapping (SLAM) is a crucial technology for autonomous mobile robots and autonomous driving systems, with a laser scanner (also known as lidar) playing a vital role as a supporting sensor for SLAM algorithms. This article provides a comprehensive review of lidar-based SLAM algorithms. Firstly, it introduces the overall framework of lidar-based SLAM, providing detailed explanations of the functions of the front-end odometry, back-end optimization, loop closure detection, and map building modules, along with a summary of the algorithms used. Secondly, it presents descriptions and summaries of representative open-source algorithms in a sequential order of 2D to 3D and single-sensor to multi-sensor fusion. Additionally, it discusses commonly used open-source datasets, precision evaluation metrics, and evaluation tools. Lastly, it offers an outlook on the development trends of lidar-based SLAM technology from four dimensions: deep learning, multi-sensor fusion, multi-robot collaboration, and robustness research.
    Reference | Related Articles | Metrics
    Study on Optimization of Cooperative Distribution Path Between UAVs and Vehicles Under Rural E-Commerce Logistics
    XU Ling, YANG Linchao, ZHU Wenxing, ZHONG Shaojun
    Computer Engineering and Applications    2024, 60 (1): 310-318.   DOI: 10.3778/j.issn.1002-8331.2306-0115
    Abstract845)      PDF(pc) (666KB)(705)       Save
    Drone delivery has emerged as a significant solution to address the challenges of last-mile logistics. The collaborative delivery model between drones and vehicles overcomes the limitations of insufficient drone delivery capacity and enhances safety, making it a vital approach for drone involvement in the delivery process. To tackle the difficulties and high costs associated with “last-mile” delivery in rural e-commerce logistics, this study constructs a mixed-integer programming model. The objective is to minimize delivery costs while considering constraints such as the collaborative drone-vehicle mode and multi drone multi-parcel delivery. A two-stage algorithm is proposed to optimize the paths for drone-vehicle collaborative delivery. In the first stage, a constrained adaptive K-means algorithm is utilized to determine the range of vehicle docking points. In the second stage, an improved genetic algorithm that incorporates hill climbing and splitting operators is employed to identify the optimal delivery paths for drones and vehicles. Subsequently, a case study experiment is conducted to validate the feasibility and effectiveness of the model and algorithm. The research findings are expected to offer novel insights and valuable references for cost reduction and efficiency improvement in last-mile delivery for rural e-commerce logistics.
    Reference | Related Articles | Metrics
    Research on Urban Logistics Distribution Mode of Bus-Assisted Drones
    PENG Yong, REN Zhi
    Computer Engineering and Applications    2024, 60 (7): 335-343.   DOI: 10.3778/j.issn.1002-8331.2212-0252
    Abstract648)      PDF(pc) (755KB)(536)       Save
    The rapid development of e-commerce forces the continuous transformation and upgrading of the logistics industry. In view of the fact that local governments encourage the development of public transport and advocate green and low-carbon logistics distribution mode, a distribution mode of bus-assisted drone is studied. After explaining the problem, a mathematical model with the lowest distribution cost is constructed, and a heuristic algorithm of smart general variable neighborhood search metaheuristic is designed to solve the problem. At the same time, in order to improve the efficiency of the algorithm, K-means clustering and greedy algorithm are introduced to generate the initial solution. Firstly, aiming at different scale examples, a variety of local search strategies and a variety of algorithms are compared to verify the effectiveness of the algorithm. Secondly, by selecting the standard CVRP as example, the single truck distribution mode and truck-drone collaborative distribution mode are compared with the distribution mode of bus-assisted drone to prove its cost and time advantages. Finally, Beijing Bus Rapid Transit Line 2 and its surrounding customer points are selected, and sensitivity analysis is made by changing the bus stop spacing and departure interval, result shows that the impact of increasing the stop spacing is greater than the change of departure interval.
    Reference | Related Articles | Metrics
    Survey of Sentiment Analysis Algorithms Based on Multimodal Fusion
    GUO Xu, Mairidan Wushouer, Gulanbaier Tuerhong
    Computer Engineering and Applications    2024, 60 (2): 1-18.   DOI: 10.3778/j.issn.1002-8331.2305-0439
    Abstract623)      PDF(pc) (954KB)(453)       Save
    Sentiment analysis is an emerging technology that aims to explore people’s attitudes toward entities and can be applied to various domains and scenarios, such as product evaluation analysis, public opinion analysis, mental health analysis and risk assessment. Traditional sentiment analysis models focus on text content, yet some special forms of expression, such as sarcasm and hyperbole, are difficult to detect through text. As technology continues to advance, people can now express their opinions and feelings through multiple channels such as audio, images and videos, so sentiment analysis is shifting to multimodality, which brings new opportunities for sentiment analysis. Multimodal sentiment analysis contains rich visual and auditory information in addition to textual information, and the implied sentiment polarity (positive, neutral, negative) can be inferred more accurately using fusion analysis. The main challenge of multimodal sentiment analysis is the integration of cross-modal sentiment information; therefore, this paper focuses on the framework and characteristics of different fusion methods and describes the popular fusion algorithms in recent years, and discusses the current multimodal sentiment analysis in small sample scenarios, in addition to the current development status, common datasets, feature extraction algorithms, application areas and challenges. It is expected that this review will help researchers understand the current state of research in the field of multimodal sentiment analysis and be inspired to develop more effective models.
    Reference | Related Articles | Metrics
    Survey on Credit Card Transaction Fraud Detection Based on Machine Learning
    JIANG Hongxun, JIANG Junyi, LIANG Xun
    Computer Engineering and Applications    2023, 59 (21): 1-25.   DOI: 10.3778/j.issn.1002-8331.2302-0129
    Abstract603)      PDF(pc) (674KB)(373)       Save
    Machine learning has its distinctiveness in credit card transaction detection and faces a more complex environment. Since the intervention of human intelligence, machine learning encounters harder challenges in fraud detection than the ones of face recognition and driverlessness, which leads to failures if only applying the processes of engineering disciplines. This paper depicts the 2000-since research history of credit card anti-fraud; identifies the definition, scope, technical streams, applications, and other key concepts, and their interconnections in the field of detection oriented machine learning; analyzes the general architecture of fraud detection and summarizes the state-of-the-art of transaction fraud detection research in terms of feature engineering, models/algorithms, and evaluation metrics; discusses various detection algorithms of credit card transaction fraud and enumerates their original intention, core ideas, solution methods, advantages or disadvantages, and relevant extensions; highlights unsupervised, supervised, and semi-supervised learning models of fraud recognition, as well as various ensembles such as models cascading and aggregation; addresses three major challenges, i.e., massive data, sample skew, and concept drift, and compiles the latest progresses to alleviate these problems. This paper concludes with the limitations, controversies, and challenges of machine learning on credit card fraud recognition, and provides the trend analysis and suggestions for future research directions.
    Reference | Related Articles | Metrics
    Review of Path Planning Algorithms for Robot Navigation
    CUI Wei, ZHU Fazheng
    Computer Engineering and Applications    2023, 59 (19): 10-20.   DOI: 10.3778/j.issn.1002-8331.2301-0088
    Abstract598)      PDF(pc) (595KB)(323)       Save
    Path planning is one of the key technologies for robot navigation. An excellent path planning algorithm can quickly find the best collision-free path and improve operational efficiency. Most existing classification methods have difficulty in expressing the differences and connections between algorithms. To distinguish different path planning algorithms more clearly, they are divided into graph-based search, bionic-based, potential field-based, velocity space-based and sampling-based algorithms based on their principle and nature. This paper introduces the concept, characteristics, and development status of each type of algorithm, analyzes the more widely used sample-based algorithms from the perspective of single-query and multi-query algorithms, and the advantages and problems of different types of path planning algorithms are compared and summarized. Finally, the future development trend of robot path planning algorithms in terms of multi-robot collaboration, multi-algorithm fusion and adaptive planning is prospected.
    Reference | Related Articles | Metrics
    Review of Deep Learning Methods for MRI Reconstruction
    DENG Gewen, WEI Guohui, MA Zhiqing
    Computer Engineering and Applications    2023, 59 (20): 67-76.   DOI: 10.3778/j.issn.1002-8331.2302-0057
    Abstract565)      PDF(pc) (580KB)(305)       Save
    Magnetic resonance imaging(MRI) is a commonly used imaging technique in the clinic, but the excessive imaging time limits its further development. Image reconstruction from undersampled k-space data has been an important part of accelerating MRI imaging. In recent years, deep learning has shown great potential in MRI reconstruction, and its reconstruction results and efficiency are better than traditional compressed sensing methods. To sort out and summarize the current deep learning-based MRI reconstruction methods, it firstly introduces the definition of MRI reconstruction problem, secondly analyzes the application of deep learning in data-driven end-to-end reconstruction and model-driven unrolled optimization reconstruction, then provides evaluation metrics and common datasets for reconstruction, and finally discusses the challenges faced by current MRI reconstruction and future research directions.
    Reference | Related Articles | Metrics
    Image Inpainting Algorithm Based on Deep Neural Networks
    LYU Jianfeng, SHAO Lizhen, LEI Xuemei
    Computer Engineering and Applications    2023, 59 (20): 1-12.   DOI: 10.3778/j.issn.1002-8331.2303-0111
    Abstract441)      PDF(pc) (720KB)(449)       Save
    With the rapid development of deep learning, computer vision technology is applied more and more widely. At the same time, the image inpainting technology based on the known information of the damaged image using deep neural network has also become a hot topic. The image inpainting methods based on depth neural network in recent years are reviewed and analyzed. Firstly, the image inpainting methods are classified and summarized according to the view of model optimization. Then the common datasets and performance evaluation indicators are introduced, and the performance evaluation and analysis of various deep neural network-based image inpainting algorithms are carried out on the relevant data sets. Finally, the challenges faced by the existing image inpainting methods are analyzed, and the future research works are prospected.
    Reference | Related Articles | Metrics
    Research on Improving YOLOv7’s Small Target Detection Algorithm
    LI Anda, WU Ruiming, LI Xudong
    Computer Engineering and Applications    2024, 60 (1): 122-134.   DOI: 10.3778/j.issn.1002-8331.2307-0004
    Abstract396)      PDF(pc) (884KB)(234)       Save
    With the continuous application of deep learning in domestic object detection, conventional large and medium object detection has made astonishing progress. However, due to the limitations of convolutional networks themselves, there are still issues of missed and false detections in small object detection. Taking dataset Visdrone 2019 and dataset FloW-Img as examples, the YOLOv7 model is studied, and the ELAN module of the backbone network is improved in the network structure. The Focal NeXt block is integrated into the long and short gradient paths of the ELAN module to enhance the feature quality of small targets and improve the contextual information content contained in the output features. The RepLKDeXt module is introduced into the head network, which not only replaces the SPPCSPC module to simplify the overall structure of the model, but also optimizes the ELAN-H structure using multi-channel, large convolutional kernels, and Cat operations. Finally, the SIOU loss function is introduced to replace the CIOU function to improve the robustness of the model. The results show that the improved YOLOv7 model reduces the number of parameters and computational complexity, and its detection performance remains approximately unchanged on the Visdrone 2019 dataset with high small target density. It increases by 9.05 percentage points on the sparse FloW-Img dataset with small targets, further simplifying the model and increasing its applicability.
    Reference | Related Articles | Metrics
    Review of Fault Diagnosis Techniques for UAV Flight Control Systems
    AN Xue, LI Shaobo, ZHANG Yizong, ZHANG Ansi
    Computer Engineering and Applications    2023, 59 (24): 1-15.   DOI: 10.3778/j.issn.1002-8331.2305-0137
    Abstract387)      PDF(pc) (917KB)(1466)       Save
    In recent years, unmanned aerial vehicles(UAVs) have been widely used in various complex fields of military and civilian applications due to their unique advantages such as low operating costs and high mobility. At the same time, the complex and diverse missions have put forward higher requirements for the reliability and safety of UAV systems. The UAV fault diagnosis technology can provide timely and accurate diagnosis results, which helps the maintenance, repair and servicing of UAVs, and is of great significance in enhancing the combat effectiveness of UAVs. Therefore, this paper firstly analyses UAV flight control systems, and classifies the faults. Secondly, the research methods and status quo of UAV fault diagnosis technology are analysed and summarised. Finally, the main challenges faced by UAV fault diagnosis technology are discussed and the future development direction is pointed out; the aim is to provide some reference for researchers in the field of UAV fault diagnosis technology and to promote the improvement of UAV fault diagnosis technology level in China.
    Reference | Related Articles | Metrics
    Multi-Object Tracking Algorithm Based on CNN-Transformer Feature Fusion
    ZHANG Yingjun, BAI Xiaohui, XIE Binhong
    Computer Engineering and Applications    2024, 60 (2): 180-190.   DOI: 10.3778/j.issn.1002-8331.2211-0028
    Abstract363)      PDF(pc) (787KB)(211)       Save
    In convolutional neural network (CNN), convolution can efficiently extract local features of the object, but it is difficult to capture global representation; in the visual Transformer, the attention mechanism can capture long-distance feature dependency, but will ignore local feature details. To solve the above problems, a multi-object tracking algorithm CTMOT (CNN transformer multi-object tracking) based on CNN-Transformer hybrid network for feature extraction and fusion is proposed. Firstly, the backbone network is adopted based on CNN and Transformer to extract the local and global features of the image respectively. Secondly, two way bridge module (TBM) is used to fully integrate two features. Then, the fused features are input to two parallel decoders for processing. Finally, the detection box and the tracking box outputted by the decoder are matched to obtain final tracking result and complete the multi-target tracking task. Evaluated on MOT17, MOT20, KITTI and UA-DETRAC multi-object tracking datasets, the MOTA indicators of CTMOT algorithm have reached 76.4%, 66.3%, 92.36% and 88.57% respectively. It is equivalent to the SOTA method on the MOT dataset, and achieves the SOTA effect on the KITTI dataset. At the same time, the MOTP and IDs indicators have reached the SOTA effect on all datasets. In addition, since the object detection and correlation are completed at the same time, the object tracking can be carried out end-to-end, and the tracking speed can reach 35 FPS, which shows that CTMOT algorithm achieves a good balance in the real-time and accuracy of tracking, and has great potential.
    Reference | Related Articles | Metrics
    Review of Intelligent Decision Optimization of Electric Vehicle Charging Stations Location
    WEI Guanyuan, WANG Guanqun, RUAN Guanmei, GENG Na
    Computer Engineering and Applications    2023, 59 (21): 52-65.   DOI: 10.3778/j.issn.1002-8331.2302-0021
    Abstract348)      PDF(pc) (683KB)(216)       Save
    A reasonable location of electric vehicle(EV) charging stations plays an important role in promoting the development of EV industry and the strategic layout of urban transportation. The relevant literature of intelligent decision optimization of charging station location is systematically reviewed to provide reference for future planning of charging station. The basic principles and influencing factors of EV charging station location are elaborated. Methods of charging demand estimation based on EV trip simulation and data analysis are summarized. The location model of EV charging station based on point demand, origin-destination pair flow demand, and EV trajectory are introduced. The exact algorithm and heuristic algorithm and deep learning algorithm for solving the EV charging station location model are summarized. Finally, limitations of existing studies are discussed, and future research focuses and directions are prospected.
    Reference | Related Articles | Metrics
    Improved Road Damage Detection Algorithm of YOLOv8
    LI Song, SHI Tao, JING Fangke
    Computer Engineering and Applications    2023, 59 (23): 165-174.   DOI: 10.3778/j.issn.1002-8331.2306-0205
    Abstract347)      PDF(pc) (671KB)(275)       Save
    Road damage detection is an important task to ensure road safety and realize timely repair of road damage. Aiming at the problems of low detection efficiency, high cost and difficulty in applying to mobile terminal devices in existing Road Damage detection algorithms, a lightweight road damage detection algorithm YOLOV8-Road Damage(YOLOV8-RD) with improved YOLOv8 is proposed. First, combining the advantages of CNN and Transformer, a BOT module that can extract global and local feature information of road damage images is proposed to adapt to the large-span and elongated features of crack objects. Then, coordinate attention(CA) is introduced in the end of backbone network and neck network to embed the location information into the channel attention, strengthen the feature extraction ability, and suppress the interference of irrelevant features. In addition, C2fGhost module is used in YOLOv8 neck network to reduce floating point computation in feature channel fusion process, reduce the number of model parameters, and improve feature expression performance. The experimental results show that in RDD2022 data set and Road Damage data set, the improved algorithm is 2% and 3.7% higher than the original algorithm compared with mAP50, while the number of model parameters is only 2.8×106 and the computation amount is only 7.3×109, which are reduced by 6.7% and 8.5% respectively. The detection speed of the algorithm reaches 88 FPS, which can accurately detect the road damage target in real time. Compared with other mainstream target detection algorithms, the effectiveness and superiority of this method are verified.
    Reference | Related Articles | Metrics
    Algorithm of Reconstructed SPPCSPC and Optimized Downsampling for Small Object Detection
    QI Xiangming, CHAI Rui, GAO Yimeng
    Computer Engineering and Applications    2023, 59 (20): 158-166.   DOI: 10.3778/j.issn.1002-8331.2305-0004
    Abstract343)      PDF(pc) (651KB)(182)       Save
    A detection algorithm is proposed of reconstructed SPPCSPC and optimized downsampling for small objects based on YOLOv7. This algorithm aims to address the challenges of detecting small objects in images, including mutual occlusion, complex backgrounds, and a limited number of feature points. To improve the detection of densely packed small objects, enhancements in the concerned dense target area are made, including cropping the CBS layer, introducing the SimAM attention mechanism, and reducing the pooling core in the SPPCSPC module of the backbone network. These modifications allow for better feature extraction of small targets that are mutually occluded. In the neck network, the SConv in the down-sampling structure is replaced by the SPD Conv and adds a quadruple down-sampling branch. These changes reduce feature loss and increase the capturing of small target features in complex backgrounds. Additionally, the Wise IoU loss function of the network model is substituted for CIoU, which focuses on the general quality frame and improves the convergence speed. Comparative and ablation experiments are conducted on the public dataset VisDrone2021, where the article increases mAP by 5.09 percentage points, achieves an FPS value of 40 and reduces the parameter count by 2.5 MB compared to the original YOLOv7 algorithm. It clearly illustrates that the modified algorithm significantly improves detection accuracy while maintaining fast inference speed and reducing the number of parameters. Furthermore, a generalization experiment is performed on the public dataset VOC2007+2012 where the mAP increased by 3.35 percentage points, indicating that the improved algorithm is versatile and can be applied to a wide range of scenarios.
    Reference | Related Articles | Metrics
    Review of Application of Deep Learning in Colon Polyp Segmentation
    SUN Fuyan, WANG Qiong, LYU Zongwang, GONG Chunyan
    Computer Engineering and Applications    2023, 59 (23): 15-27.   DOI: 10.3778/j.issn.1002-8331.2303-0124
    Abstract305)      PDF(pc) (626KB)(210)       Save
    Most colorectal cancers originate from malignant lesions of colon polyps. It is of great clinical significance to use computer-aided diagnosis system to automatically and accurately segment colon polyps, which can help doctors improving the detection rate of polyps during colonoscopy. Nowadays, deep learning technology is widely used in medical image segmentation, and the colon polyp segmentation algorithm based on deep learning has also made significant progress. Firstly, the traditional polyp segmentation algorithm and its advantages and limitations are briefly introduced. Secondly, the deep learning polyp segmentation algorithm is reviewed in three aspects:segmentation model based on classical CNN structure, U-Net structure, and multi-model fusion, then the improvement strategy of the algorithm and its advantages and limitations is summarized. The public datasets of the colon polyp image and the data preprocessing methods are concluded. Finally, the challenges of polyp segmentation based on deep learning are summarized, and the future research direction in this field is prospected.
    Reference | Related Articles | Metrics
    Improved YOLOv8s Model for Small Object Detection from Perspective of Drones
    PAN Wei, WEI Chao, QIAN Chunyu, YANG Zhe
    Computer Engineering and Applications    2024, 60 (9): 142-150.   DOI: 10.3778/j.issn.1002-8331.2312-0043
    Abstract305)      PDF(pc) (5858KB)(434)       Save
    Facing with the problems of small and densely distributed image targets, uneven class distribution, and model size limitation of hardware conditions, object detection from the perspective of drones has less precise results. A new improved model based on YOLOv8s with multiple attention mechanisms is proposed. To solve the problem of shared attention weight parameters in receptive field features and enhance feature extraction ability, receptive field attention convolution and CBAM (concentration based attention module) attention mechanism are introduced into the backbone, adding attention weight in channel and spatial dimensions. By introducing large separable kernel attention into feature pyramid pooling layers, information fusion between different levels of features is increased. The feature layers with rich semantic information of small targets are added to improve the neck structure. The inner-IoU loss function is used to improve the MPDIoU (minimum point distance based IoU) function and the inner-MPDIoU instead of the original loss function is used to enhance the learning ability for difficult samples. The experimental results show that the improved YOLOv8s model has improved mAP, P, and R by 16.1%, 9.3%, and 14.9% respectively on the VisDrone dataset, surpassing YOLOv8m in performance and can be effectively applied to unmanned aerial vehicle visual detection tasks.
    Reference | Related Articles | Metrics
    Graph Convolutional Neural Network and Its Application in Image Recognition
    LI Wenjing, BAI Jing, PENG Bin, YANG Zhanyuan
    Computer Engineering and Applications    2023, 59 (22): 15-35.   DOI: 10.3778/j.issn.1002-8331.2302-0273
    Abstract302)      PDF(pc) (803KB)(282)       Save
    Convolutional neural network has found widespread application in the field of image recognition, demonstrating remarkable feature extraction capabilities. However, it is inherently designed for processing structured data in Euclidean space, making it less suitable for handling unstructured data. To address this limitation, graph convolutional neural network leverages spectral and spatial methods to extend the scope of convolutional operations, enabling feature learning in non-Euclidean spaces. GCN possesses translational invariance for graph data, facilitating representation learning for unstructured data. Firstly, the basic principles and improvement work of two types of graph convolutional neural networks based on spectral domain and space domain are explained. Then, around the field of image recognition, the application of graph convolutional neural network in multi-label image recognition, skeleton-based action recognition and hyperspectral image classification is introduced, the research progress is summarized, and the performance comparison and analysis of related models are carried out. Finally, the content of the full text is summarized and the future development direction is looked forward.
    Reference | Related Articles | Metrics
    Improved YOLOv8 Object Detection Algorithm for Traffic Sign Target
    TIAN Peng, MAO Li
    Computer Engineering and Applications    2024, 60 (8): 202-212.   DOI: 10.3778/j.issn.1002-8331.2309-0415
    Abstract299)      PDF(pc) (937KB)(238)       Save
    Although the current testing technology is becoming increasingly mature, the detection of small targets in complex environments is still the most difficult point in research. Aiming at the problem of high target proportion of traffic signs in road traffic scenarios, the problem of high target proportion of small targets and large environmental interference factors, it proposes a type of road traffic logo target test algorithm based on YOLOv8 improvement. Due to the prone to missed inspection in small target testing, the bi-level routing attention (BRA) attention mechanism is used to improve the network’s perception of small targets. In addition, it also uses a shape-changing convolutional module deformable convolution V3 (DCNV3). It has a better feature extraction ability for irregular shapes in the feature map, so that the backbone network can better adapt to irregular space structures, and pay more accurately to important attention,objectives, thereby improving the detection ability of the model to block the overlapping target. Both DCNV3 and BRA modules improve the accuracy of the model without increasing the weight of the model. At the same time, the Inner-IOU loss function based on auxiliary border is introduced. On the four data sets of RoadSign, CCTSDB, TSDD, and GTSDB, small sample training, large sample training, single target detection, and multi-target detection are performed. The experimental results are improved. Among them, the experiments on the RoadSign data set are the best. The average accuracy of the improved YOLOv8 model mAP50 and mAP50:95 reaches 90.7% and 75.1%, respectively. Compared with the baseline model, mAP50 and mAP50:95 have increased by 5.9 and 4.8 percentage points, respectively. The experimental results show that the improved YOLOV8 model effectively implements the traffic symbol detection in complex road scenarios.
    Reference | Related Articles | Metrics
    Survey on Video-Text Cross-Modal Retrieval
    CHEN Lei, XI Yimeng, LIU Libo
    Computer Engineering and Applications    2024, 60 (4): 1-20.   DOI: 10.3778/j.issn.1002-8331.2306-0382
    Abstract293)      PDF(pc) (3662KB)(274)       Save
    Modalities define the specific forms in which data exist. The swift expansion of various modal data types has brought multimodal learning into the limelight. As a crucial subset of this field, cross-modal retrieval has achieved noteworthy advancements, particularly in integrating images and text. However, videos, as opposed to images, encapsulate a richer array of modal data and offer a more extensive spectrum of information. This richness aligns well with the growing user demand for comprehensive and adaptable information retrieval solutions. Consequently, video-text cross-modal retrieval has emerged as a burgeoning area of research in recent times. To thoroughly comprehend video-text cross-modal retrieval and its state-of-the-art developments, a methodical review and summarization of the existing representative methods is conducted. Initially, the focus is on analyzing current deep learning-based unidirectional and bidirectional video-text cross-modal retrieval methods. This analysis includes an in-depth exploration of seminal works within each category, highlighting their strengths and weaknesses. Subsequently, the discussion shifts to an experimental viewpoint, introducing benchmark datasets and evaluation metrics specific to video-text cross-modal retrieval. The performance of several standard methods in benchmark datasets is compared. Finally, the application prospects and future research challenges of video- text cross-modal retrieval are discussed.
    Reference | Related Articles | Metrics
    Fine-Grained Image Classification Combining Swin and Multi-Scale Feature Fusion
    XIANG Jianwen, CHEN Minrong, YANG Baibing
    Computer Engineering and Applications    2023, 59 (20): 147-157.   DOI: 10.3778/j.issn.1002-8331.2211-0456
    Abstract293)      PDF(pc) (718KB)(185)       Save
    Challenged by high intra-class variances and low inter-class variances in fine-grained image classification, this paper proposes a fine-grained image classification model based on Swin and multi-scale feature fusion(SwinFC). The Swin Transformer model with multi-stage hierarchical design is used as a new visual backbone network to extract local and global information and multi-scale features. Then, a module integrating external-dependency attention and cross-space attention is embedded on the branches of each stage, which aims to capture potential correlations among data samples and discriminative feature information from different spatial directions, enhancing the information representation in each stage of the network. Further, a feature fusion module is introduced to perform multi-scale fusion of the features extracted at each stage, so that the network can learn more comprehensive, complementary and diverse feature information. Finally, in order to enlarge inter-class differences, narrow the intra-class differences, a feature selection module is adopted to select important and discriminative image patches, enhancing the discriminative power of the network. Experimental results show that the proposed method achieves classification accuracy of 92.5%, 91.8% and 85.84% on three public fine-grained image datasets, CUB-200-2011, NABirds and WebFG-496, respectively, outperforming most of the mainstream methods in classification performance. Moreover, compared with the benchmark model Swin, the classification performance is improved by 1.4, 2.6 and 4.86 percentage points, respectively.
    Reference | Related Articles | Metrics
    Ghost-YOLOv8 Detection Algorithm for Traffic Signs
    XIONG Enjie, ZHANG Rongfen, LIU Yuhong, PENG Jingxiang
    Computer Engineering and Applications    2023, 59 (20): 200-207.   DOI: 10.3778/j.issn.1002-8331.2306-0032
    Abstract288)      PDF(pc) (564KB)(188)       Save
    Aiming at the problems of low accuracy and inaccurate detection of traffic signs in the current traditional network model, a Ghost-YOLOv8 traffic sign detection model is proposed based on YOLOv8n optimization and improvement. First of all, using GhostConv instead of some Conv, designing a new C2fGhost module instead of some C2f, the model’s parameters is reduced and the detection performance of the model is enhanced. Secondly, a GAM attention mecha-
    nism module is added to the Neck part to strengthen the semantic and positional information in the features, which improves the feature fusion ability of the model; then, for detecting the loss of semantic information when detecting small targets, it adds a small target detection layer to enhance the combination of deep semantic information and shallow semantic information. Finally, it uses the GIoU boundary border loss function to replace the original loss function, which improves the backbone of the network’s boundary frame. The experimental results show that the accuracy(Precision) and average accuracy average(mAP) of the improvement model in China traffic sign detection data set TT100K are increased by 9.5 and 6.5 percentage points compared with the original model. And the reduction of number of model parameters and model size are 0.223×109 and 0.2 MB, respectively. Comprehensive explanation, the model of this article improves the detection accuracy while reducing the amount and size of the model, which is significantly better than the comparison algorithm, and also meets the requirements of the edge computing equipment, and has practical application value.
    Reference | Related Articles | Metrics
    Improved YOLOv8 Multi-Scale and Lightweight Vehicle Object Detection Algorithm
    ZHANG Lifeng, TIAN Ying
    Computer Engineering and Applications    2024, 60 (3): 129-137.   DOI: 10.3778/j.issn.1002-8331.2309-0145
    Abstract283)      PDF(pc) (713KB)(266)       Save
    To address issues such as high hardware requirements, low detection accuracy, and a high rate of missed overlapping targets in traditional vehicle object detection models, a modified vehicle object detection algorithm called RBT-YOLO based on YOLOv8 is proposed. The main network is reconstructed using a multi-scale fusion approach. BiFPN is improved by adding convolutional operations and adjusting input/output channel numbers to adapt to YOLOv8, enhancing its feature fusion capability. After the feature maps are output from the Neck section, a lightweight attention mechanism called Triplet Attention is introduced to enhance the feature extraction ability of the model. To address the issue of high target overlap in real scenarios, SoftNMS (soft non-maximum suppression) is used to replace the original NMS, making the model to handle the candidate boxes more gentle, thereby strengthening detection capabilities of the model and improving recall rates. Experimental results on the Pascal VOC and MS COCO datasets demonstrate that the proposed RBT-YOLO outperforms the original model, reducing parameters and computations by approximately 60%, the mAP improved by 2.6 and 3.0 percentage points, and excelling in both size and precision compared to other classic detection models, thus demonstrating strong practical utility.
    Reference | Related Articles | Metrics
    Survey of Agricultural Knowledge Graph
    TANG Wentao, HU Zelin
    Computer Engineering and Applications    2024, 60 (2): 63-76.   DOI: 10.3778/j.issn.1002-8331.2305-0203
    Abstract278)      PDF(pc) (629KB)(207)       Save
    Knowledge graphs are a key technology in the era of big data, specifically for knowledge engineering. Utilizing the powerful semantic understanding and knowledge organization capabilities of knowledge graphs, issues such as scattered and disordered agricultural knowledge, and insufficient coverage of knowledge in the construction of modern agriculture can be resolved. Firstly, considering the complexity and specialty of agricultural data, the construction methods and framework of agricultural knowledge graphs are introduced. Secondly, the current domestic and international research status of the four key technologies in the construction of agricultural knowledge graphs-ontology construction, knowledge extraction, knowledge fusion, and knowledge reasoning are reviewed. Furthermore, the systematic applications of agricultural knowledge graphs in decision support, intelligent question-answering systems, and recommendation systems are sorted out. Lastly, several specific instances of agricultural knowledge graphs are presented. Based on the current status of research on agricultural knowledge graphs, prospects for its future research directions are offered.
    Reference | Related Articles | Metrics
    Review of Deep Learning Methods Applied to Medical CT Super-Resolution
    TIAN Miaomiao, ZHI Lijia, ZHANG Shaomin, CHAO Daifu
    Computer Engineering and Applications    2024, 60 (3): 44-60.   DOI: 10.3778/j.issn.1002-8331.2303-0224
    Abstract270)      PDF(pc) (867KB)(205)       Save
    Image super resolution (SR) is one of the important processing methods to improve image resolution in the field of computer vision, which has important research significance and application value in the field of medical image. High quality and high-resolution medical CT images are very important in the current clinical process. In recent years, the technology of medical CT image super-resolution reconstruction based on deep learning has made remarkable progress. This paper reviews the representative methods in this field and systematically reviews the development of medical CT image super-resolution reconstruction technology. Firstly, the basic theory of SR is introduced, and the commonly used evaluation indexes are given. Then, it focuses on the innovation and progress of super-resolution reconstruction of medical CT images based on deep learning, and makes a comprehensive comparative analysis of the main characteristics and performance of each method. Finally, the difficulties and challenges in the direction of medical CT image super-resolution reconstruction are discussed, and the future development trend is summarized and prospected, hoping to provide reference for related research.
    Reference | Related Articles | Metrics
    Survey of Online Course Recommendation System
    YU Peng, LIU Xingyu, CHENG Hao, YANG Jiaqi, CHEN Guohua, HE Chaobo
    Computer Engineering and Applications    2023, 59 (22): 1-14.   DOI: 10.3778/j.issn.1002-8331.2305-0162
    Abstract265)      PDF(pc) (692KB)(216)       Save
    The rapid development of online education has led to an explosive growth in the number of online courses, and learners are easily caught in inefficient access to course information caused by “course overload”, which has driven the emergence and development of online course recommendation systems. At present, online course recommendation systems have become a hot spot for research, and a large number of methods have been proposed in this area, so it is necessary to systematically review and analyze the latest research progress. This paper first summarizes the basic framework and related concepts of online course recommendation systems, and then focuses on comparing and analyzing various core recommendation methods used in existing online course recommendation systems, including these methods based on association rule mining, matrix factorization, probabilistic model, deep learning, intelligent optimization, semantic computing, and so on. Finally, this paper introduces various evaluation metrics of online course recommendation systems and publicly available datasets, and proposes the future development direction.
    Reference | Related Articles | Metrics
    Research Progress of Machine Vision in Crop Seed Inspection
    WANG Hao, ZHU Yuhua, LI Zhihui, ZHEN Tong
    Computer Engineering and Applications    2023, 59 (22): 69-83.   DOI: 10.3778/j.issn.1002-8331.2303-0166
    Abstract250)      PDF(pc) (858KB)(154)       Save
    Crop seeds are the basis of agricultural production. Seed testing, as an important tool, plays an indispensable role in all aspects of seed production, trade, and utilization. However, traditional crop seed identification methods are inefficient and require the support of manpower as well as specialized testing equipment. In contrast, machine vision technology can realize non-destructive detection of targets by simulating human visual function with high efficiency and accuracy, which helps to realize the automation and intelligence of variety identification, grading, and classification of crop seeds. The paper first briefly describes the method of image acquisition and pre-processing in machine vision technology, and gives the current mainstream processing flow by taking corn seeds as an example, then specifically describes the application of the two detection methods of traditional machine learning and deep learning in machine vision technology in the detection of crop seeds. Finally, for the research on corn unsound kernels, while dividing it into the above two detection methods, this paper gives a specific description, and also points out the current problems and the future research direction of corn unsound kernel detection.
    Reference | Related Articles | Metrics
    Review of Development of Deep Learning Optimizer
    CHANG Xilong, LIANG Kun, LI Wentao
    Computer Engineering and Applications    2024, 60 (7): 1-12.   DOI: 10.3778/j.issn.1002-8331.2307-0370
    Abstract246)      PDF(pc) (1327KB)(281)       Save
    Optimization algorithms are the most critical  factor in improving the performance of deep learning models, achieved by minimizing the loss function. Large language models (LLMs), such as GPT, have become the research focus in the field of natural language processing, the optimization effect of traditional gradient descent algorithm has been limited. Therefore, adaptive moment estimation algorithms have emerged, which are significantly superior to traditional optimization algorithms in generalization ability. Based on gradient descent, adaptive gradient, and adaptive moment estimation algorithms, and the pros  and cons of optimization algorithms are analyzed. This paper applies optimization algorithms to the Transformer architecture and selects the French-English translation task as the evaluation benchmark. Experiments have shown that adaptive moment estimation algorithms can effectively improve the performance of the model in machine translation tasks. Meanwhile, it discusses the development direction and applications of optimization algorithms.
    Reference | Related Articles | Metrics
    Improved YOLOv8 Small Target Detection Algorithm in Aerial Images
    FU Jinyi, ZHANG Zijia, SUN Wei, ZOU Kaixin
    Computer Engineering and Applications    2024, 60 (6): 100-109.   DOI: 10.3778/j.issn.1002-8331.2311-0281
    Abstract244)      PDF(pc) (771KB)(226)       Save
    In aerial image detection task, object and the overall image size are small, scales have different characteristics and detail information is not clear, it can cause leak and mistakenly identified problems, an improved small target detection algorithm CA-YOLOv8 is proposed. Channel feature partial convolution (CFPConv) is designed. Based on this, it reconstructs a Bottleneck structure in C2f, which is named CFP_C2f. In this way, some C2f modules in YOLOv8 head and neck are replaced, the effective channel feature weights are enhanced, and the ability to obtain multi-scale detail features is improved. A context aggregated module (CAM) is embedded to improve the context aggregation ability, optimize the response of feature channels, and strengthen the ability to perceive the details of deep features. The NWD loss function is added and combined with CIoU as a positioning regression loss function to reduce the sensitivity of position bias. By making full use of the advantages of multiple attention mechanism, the original detection head is replaced with DyHead (dynamic head). In the experiment of VisDrone2019 dataset, the improved algorithm reduces the number of parameters by 33.3% compared with the original YOLOv8s model, and the detection accuracy of mAP50 and mAP50:95 increases by 8.7 and 5.7 percentage points respectively, showing good performance and confirming its effectiveness.
    Reference | Related Articles | Metrics
    Review on Human Action Recognition Methods Based on Multimodal Data
    WANG Cailing, YAN Jingjing, ZHANG Zhidong
    Computer Engineering and Applications    2024, 60 (9): 1-18.   DOI: 10.3778/j.issn.1002-8331.2310-0090
    Abstract242)      PDF(pc) (8541KB)(374)       Save
    Human action recognition (HAR) is widely applied in the fields of intelligent security, autonomous driving and human-computer interaction. With advances in capture equipment and sensor technology, the data that can be acquired for HAR is no longer limited to RGB data, but also multimodal data such as depth, skeleton, and infrared data. Feature extraction methods in HAR based on RGB and skeleton data modalities are introduced in detail, including handcrafted-based and deep learning-based methods. For RGB data modalities, feature extraction algorithms based on two-stream convolutional neural network (2s-CNN), 3D convolutional neural network (3DCNN) and hybrid network are analyzed. For skeleton data modalities, some popular pose estimation algorithms for single and multi-person are firstly introduced. The classification algorithms based on convolutional neural network (CNN), recurrent neural network (RNN), and graph convolutional neural network (GCN) are analyzed stressfully. A further comprehensive demonstration of the common datasets for both data modalities is presented. In addition, the current challenges are explored based on the corresponding data structure features of RGB and skeleton. Finally, future research directions for deep learning-based HAR methods are discussed.
    Reference | Related Articles | Metrics
    Survey of Few-Shot Image Classification Based on Deep Meta-Learning
    ZHOU Bojun, CHEN Zhiyu
    Computer Engineering and Applications    2024, 60 (8): 1-15.   DOI: 10.3778/j.issn.1002-8331.2308-0271
    Abstract240)      PDF(pc) (1091KB)(298)       Save
    Deep meta-learning has emerged as a popular paradigm for addressing few-shot classification problems. A comprehensive review of recent advancements in few-shot image classification algorithms based on deep meta-learning is provided. Starting from the problem description, the categorizes of the algorithms based on deep meta-learning for few-shot image classification are summarized, and commonly used few-shot image classification datasets and evaluation criteria are introduced. Subsequently, typical models and the latest research progress are elaborated in three aspects: model-based deep meta-learning methods, optimization-based deep meta-learning methods, and metric-based deep meta-learning methods. Finally, the performance analysis of existing algorithms on popular public datasets is presented, the research hotspots in this topic are summarized, and its future research directions are discussed.
    Reference | Related Articles | Metrics
    Algorithm for Real-Time Vehicle Detection from UAVs Based on Optimizing and Improving YOLOv8
    SHI Tao, CUI Jie, LI Song
    Computer Engineering and Applications    2024, 60 (9): 79-89.   DOI: 10.3778/j.issn.1002-8331.2312-0291
    Abstract238)      PDF(pc) (4614KB)(315)       Save
    To address the problems of low accuracy, easy interference from background environment and difficulty in detecting small target vehicles of existing UAV vehicle detection algorithms, an improved UAV vehicle detection algorithm YOLOv8-CX is proposed based on YOLOv8. By integrating the advantages of Deformable Convolutional Networks v1-3, a C2f-DCN module is proposed to flexibly sample features and better extract features between vehicles of different sizes. Utilizing the idea of large separable kernel attention, a SPPF-LSKA module is proposed with long-range dependency and self-adaptability, which can effectively reduce background interference on vehicle detection. In the neck network, a CF-FPN (ment network for tiny object deteciton) feature fusion structure is adopted to enhance the detection accuracy of small targets by combining contextual information and suppressing conflicts between features at different scales. Finally, the original YOLOv8 head is replaced with a Dynamic Head detection head. By unifying scale, space and task, the three types of attention mechanisms, the model detection performance is further improved. Experimental results show that on the Mapsai dataset, compared with the original algorithm, the improved algorithm increases the accuracy (P), recall (R) and mean average precision (mAP) by 8.5, 11.2 and 6.2 percentage points respectively, and the algorithm detection speed reaches 72.6 FPS, meeting the real-time requirements of UAV vehicle detection. By comparing with other mainstream target detection algorithms, the effectiveness and superiority of this method are validated.
    Reference | Related Articles | Metrics
    Small Sample Steel Plate Defect Detection Algorithm of Lightweight YOLOv8
    DOU Zhi, GAO Haoran, LIU Guoqi, CHANG Baofang
    Computer Engineering and Applications    2024, 60 (9): 90-100.   DOI: 10.3778/j.issn.1002-8331.2311-0070
    Abstract238)      PDF(pc) (5010KB)(305)       Save
    The surface area of steel plate is large, and the surface defects are very common, and showing the characteristics of multi-class and small amount. Deep learning is difficult to be effectively applied to the detection of such small sample defects. In order to solve this problem, a small sample steel plate defect detection algorithm based on lightweight YOLOv8 is proposed. Firstly, an interactive data augmentation algorithm based on fuzzy search is proposed, which can effectively solve the problem that the network model cannot be effectively trained due to the lack of training samples, making it possible for deep learning to be applied in this field. Then, the LMRNet (lightweight multi-scale residual networks) network is designed to replace the backbone of YOLOv8, to achieve the lightweight of the network model and improve its portability. Finally, the CBFPN (context bidirectional feature pyramid network) and ECSA (efficient channel spatial attention) modules are proposed to make the network more effective in extracting and fusing scar features, and the Wise-IoU loss function is adopted to improve the detection performance. The comparative experimental results show that compared with the original YOLOv8 algorithm, the amount of parameters of the improved network is only 30% of the original network, the amount of calculation is 49% of the original network, the FPS is increased by 9 frame/s. The accuracy rate, recall rate and mAP have increased by 2.9, 6.5 and 5.5 percentage points respectively. Experimental results fully verify the advantages of the proposed algorithm.
    Reference | Related Articles | Metrics
    Research on Gesture Recognition Based on Improved YOLOv5 and Mediapipe
    NI Guangxing, XU Hua, WANG Chao
    Computer Engineering and Applications    2024, 60 (7): 108-118.   DOI: 10.3778/j.issn.1002-8331.2308-0097
    Abstract236)      PDF(pc) (686KB)(194)       Save
    The existing gesture recognition algorithms have the problems of large amounts of calculation and poor robustness. In this paper, a gesture recognition method based on IYOLOv5-Med (improved YOLOv5 Mediapipe) algorithm is proposed. This algorithm combines the improved YOLOv5 algorithm with the Mediapipe method, including gesture detection and gesture analysis. In the part of gesture detection, the traditional YOLOv5 algorithm is improved. Firstly, the C3 module is reconstructed by FastNet. Secondly, the CBS module is replaced by the GhostConv module in GhostNet. Thirdly, the SE attention mechanism module is introduced at the end of the Backbone network. The improved algorithm has a smaller model size and is more suitable for edge devices with limited resources. In the part of gesture analysis, a method based on Mediapipe is proposed. The key points of the hand are detected in the gesture area located in the gesture detection part, and the relevant features are extracted, and then identified by the naive Bayes classifier. The experimental findings affirm the efficacy of the IYOLOv5-Med algorithm introduced in this article. When compared to the conventional YOLOv5 algorithm, the parameters are reduced by 34.5%, the computations are reduced by 34.9%, and the model weight is decreased by 33.2%. The final average recognition rate reaches 0.997, and the implementation method is relatively simple, which has a good application prospect.
    Reference | Related Articles | Metrics
    Survey on Attack Methods and Defense Mechanisms in Federated Learning
    ZHANG Shiwen, CHEN Shuang, LIANG Wei, LI Renfa
    Computer Engineering and Applications    2024, 60 (5): 1-16.   DOI: 10.3778/j.issn.1002-8331.2306-0243
    Abstract234)      PDF(pc) (792KB)(259)       Save
    The attack and defense techniques of federated learning are the core issue of federated learning system security. The attack and defense techniques of federated learning can significantly reduce the risk of being attacked and greatly enhance the security of federated learning systems. Deeply understanding the attack and defense techniques of federated learning can advance research in the field and achieve its widespread application of federated learning. Therefore, it is of great significance to study the attack and defense techniques of federated learning. Firstly, this paper briefly introduces the concept, basic workflow, types, and potential existing security issues of federated learning. Subsequently, the paper introduces the attacks that the federated learning system may encounter, and relevant research is summarized during the introduction. Then, starting from whether the federated learning system has targeted defense measures, the defense measures are divided into two categories:universal defense measures and targeted defense measures, and targeted summary are made. Finally, it reviews and analyzes the future research directions for the security of federated learning, providing reference for relevant researchers in their research work on the security of federated learning.
    Reference | Related Articles | Metrics
    Survey of Chinese Named Entity Recognition Research
    ZHAO Jigui, QIAN Yurong, WANG Kui, HOU Shuxiang, CHEN Jiaying
    Computer Engineering and Applications    2024, 60 (1): 15-27.   DOI: 10.3778/j.issn.1002-8331.2304-0398
    Abstract231)      PDF(pc) (606KB)(160)       Save
    Named entity recognition (NER) is one of the most fundamental tasks in natural language processing, and its main content is to identify the entity types and boundaries with specific meanings in natural language text. However, the data samples of Chinese named entity recognition (CNER) have problems such as blurred word boundaries, semantic diversity, blurred morphological features and small Chinese corpus content, which make it difficult to improve the performance of Chinese NER. In this paper, firstly, the dataset, annotation scheme and evaluation index of CNER are introduced. Secondly, according to the research process of CNER, CNER methods are classified into three categories: rule-based methods, statistical-based methods and deep learning-based methods, and the main models of CNER based on deep learning in the past five years are summarized. Finally, the research trends of CNER are discussed to provide some reference for the proposal of new methods and future research directions.
    Reference | Related Articles | Metrics
    Small Object Detection Algorithm Based on ATO-YOLO
    SU Jia, QIN Yichang, JIA Ze, WANG Jing
    Computer Engineering and Applications    2024, 60 (6): 68-77.   DOI: 10.3778/j.issn.1002-8331.2308-0385
    Abstract226)      PDF(pc) (795KB)(219)       Save
    Small object detection is of great significance in the field of computer vision. However, existing methods often suffer from issues such as missed detection and false alarms when dealing with challenges like scale variation, dense object arrangement, and irregular layouts. To address these problems, ATO-YOLO, an improved version of the YOLOv5 algorithm is proposed. Firstly, this paper introduces an adaptive feature extraction (AFE) module that incorporates an attention mechanism to enhance the feature representation capability of the detection model. By dynamically adjusting the weight allocation to highlight key object features, AFE improves the accuracy and robustness of object detection tasks in various scenarios. Secondly, a triple feature fusion (TFF) mechanism is designed to effectively utilize multi-scale information by fusing feature maps from different scales, resulting in more comprehensive object features and enhanced detection performance for small objects. Lastly, an output reconstruction (ORS) module is introduced, which removes the large object detection layer and adds a small object detection layer, enabling precise localization and recognition of small objects. This module also reduces model complexity and improves detection speed compared to the original model. Experimental results demonstrate that the ATO-YOLO algorithm achieves an mAP@0.5 of 38.2% on the VisDrone dataset, a 6.1?percentage points improvement over YOLOv5, with a relative FPS increase of 4.4%. This algorithm enables fast and accurate detection of small objects.
    Reference | Related Articles | Metrics
    Survey About Generative Adversarial Network and Text-to-Image Synthesis
    LAI Li’na, MI Yu, ZHOU Longlong, RAO Jiyong, XU Tianyang, SONG Xiaoning
    Computer Engineering and Applications    2023, 59 (19): 21-39.   DOI: 10.3778/j.issn.1002-8331.2211-0392
    Abstract222)      PDF(pc) (933KB)(164)       Save
    With the popularity of multi-sensors, multi-modal data has received continuous attention from scientific research and industry. The technology of processing multi-source modal information through deep learning is the core. Text-to-image generation is one of the directions of multi-modal technology. Because the images generated by generative adversarial network(GAN) are more realistic, the generation of text images has made excellent progress. It can be used in many fields such as image editing and colorization, style transfer, object deformation, and photo enhancement, etc. In this review, GAN networks based on image generation function are divided into four categories:semantic-enhanced GAN, growth-able GAN, diversity-enhanced GAN, and intelligence-enhanced GAN. According to the direction provided by the taxonomy, the function-based text image generation models are integrated and compared to clarify the context. The existing evaluation indicators and commonly used data sets are analyzed, and the feasibility and future development trend of complex text processing are clarified. This review systematically complements the analysis of generative adversarial networks in text image generation and will help researchers further advance this field.
    Reference | Related Articles | Metrics
    Survey of Vision Transformer in Low-Level Computer Vision
    ZHU Kai, LI Li, ZHANG Tong, JIANG Sheng, BIE Yiming
    Computer Engineering and Applications    2024, 60 (4): 39-56.   DOI: 10.3778/j.issn.1002-8331.2304-0139
    Abstract221)      PDF(pc) (3488KB)(168)       Save
    Transformer is a revolutionary neural network architecture initially designed for natural language processing. However, its outstanding performance and versatility have led to widespread applications in the field of computer vision. While there is a wealth of research and literature on Transformer applications in natural language processing, there remains a relative scarcity of specialized reviews focusing on low-level visual tasks. In light of this, this paper begins by providing a brief introduction to the principles of Transformer and analyzing several variants. Subsequently, the focus shifts to the application of Transformer in low-level visual tasks, specifically in the key areas of image restoration, image enhancement, and image generation. Through a detailed analysis of the performance of different models in these tasks, this paper explores the variations in their effectiveness on commonly used datasets. This includes achievements in restoring damaged images, improving image quality, and generating realistic images. Finally, this paper summarizes and forecasts the development trends of Transformer in the field of low-level visual tasks. It suggests directions for future research to further drive innovation and advancement in Transformer applications. The rapid progress in this field promises breakthroughs for computer vision and image processing, providing more powerful and efficient solutions for practical applications.
    Reference | Related Articles | Metrics
    Vehicle Detection Algorithm Based on Improved YOLOv8 in Traffic Surveillance
    ZHOU Fei, GUO Dudu, WANG Yang, WANG Qingqing, QIN Yin, YANG Zhuomin, HE Haijun
    Computer Engineering and Applications    2024, 60 (6): 110-120.   DOI: 10.3778/j.issn.1002-8331.2310-0101
    Abstract216)      PDF(pc) (817KB)(231)       Save
    To address the current problems of insufficient vehicle detection accuracy and slow detection speed in complex traffic monitoring scenarios, a lightweight vehicle detection algorithm based on YOLOv8 model is proposed. Firstly, FasterNet is used to replace the backbone feature extraction network of YOLOv8, which reduces redundant computation and memory access, and improves the detection accuracy and inference speed of the model.Secondly, the SimAM attention module is added to the Backbone and Neck sections, which enhances the important features of the target vehicles without increasing the original network parameters, and improves the feature fusion capability. Then, to address the problem of poor detection of small-sized vehicles under dense traffic flow, a small target detection head is added to better capture the features and contextual information of small-sized vehicles. Finally, Wise-IoU, which can adaptively adjust the weight coefficients, is used as the loss function of the improved model, which enhances the regression performance of the bounding box and the robustness of the detection.The experimental results on the UA-DETRAC dataset show that compared with the original model, the improved method in this paper is able to achieve better detection accuracy and speed in the traffic monitoring system, with the mAP and FPS improved by 3.06 percengtage points and 3.36%, respectively, which effectively improves the problem of the poor detection of small-target vehicles in the complex traffic scenarios, and achieves a good balance between accuracy and speed.
    Reference | Related Articles | Metrics
    Lightweight Foggy Weather Object Detection Method Based on YOLOv5
    LAI Jing’an, CHEN Ziqiang, SUN Zongwei, PEI Qingqi
    Computer Engineering and Applications    2024, 60 (6): 78-88.   DOI: 10.3778/j.issn.1002-8331.2308-0029
    Abstract208)      PDF(pc) (1220KB)(220)       Save
    Aiming at the low accuracy and high model complexity of object detection algorithms in foggy scenes, a lightweight foggy object detection method based on YOLOv5 is proposed. Firstly, this paper adopts the receptive field attention module (RFAblock) to add an attention mechanism to the receptive field by interacting with the receptive field feature information to improve the feature extraction ability. Secondly, the lightweight network Slimneck is used as the neck structure to reduce the model parameters and complexity while maintaining the accuracy. The angle vector between the real frame and the predicted frame is introduced in the loss function to improve the training speed and inference accuracy. PNMS (precise non-maximum suppression) is used to improve the candidate frame selection mechanism and reduce the leakage detection rate in the case of vehicle occlusion. Finally, the experimental results are tested on the real foggy day dataset RTTS and the synthetic foggy day dataset Foggy Cityscapes, and the experimental results show that the mAP50 is improved by 4.9 and 3.5 percengtage points, respectively, compared with YOLOv5l, and the number of model parameters is only 54.6% of that of YOLOv5l.
    Reference | Related Articles | Metrics