Most Download articles

    Published in last 1 year| In last 2 years| In last 3 years| All| Most Downloaded in Recent Month | Most Downloaded in Recent Year|

    Published in last 1 year
    Please wait a minute...
    For Selected: Toggle Thumbnails
    Improved YOLOv8s Model for Small Object Detection from Perspective of Drones
    PAN Wei, WEI Chao, QIAN Chunyu, YANG Zhe
    Computer Engineering and Applications    2024, 60 (9): 142-150.   DOI: 10.3778/j.issn.1002-8331.2312-0043
    Abstract592)      PDF(pc) (5858KB)(733)       Save
    Facing with the problems of small and densely distributed image targets, uneven class distribution, and model size limitation of hardware conditions, object detection from the perspective of drones has less precise results. A new improved model based on YOLOv8s with multiple attention mechanisms is proposed. To solve the problem of shared attention weight parameters in receptive field features and enhance feature extraction ability, receptive field attention convolution and CBAM (concentration based attention module) attention mechanism are introduced into the backbone, adding attention weight in channel and spatial dimensions. By introducing large separable kernel attention into feature pyramid pooling layers, information fusion between different levels of features is increased. The feature layers with rich semantic information of small targets are added to improve the neck structure. The inner-IoU loss function is used to improve the MPDIoU (minimum point distance based IoU) function and the inner-MPDIoU instead of the original loss function is used to enhance the learning ability for difficult samples. The experimental results show that the improved YOLOv8s model has improved mAP, P, and R by 16.1%, 9.3%, and 14.9% respectively on the VisDrone dataset, surpassing YOLOv8m in performance and can be effectively applied to unmanned aerial vehicle visual detection tasks.
    Reference | Related Articles | Metrics
    Research on Urban Logistics Distribution Mode of Bus-Assisted Drones
    PENG Yong, REN Zhi
    Computer Engineering and Applications    2024, 60 (7): 335-343.   DOI: 10.3778/j.issn.1002-8331.2212-0252
    Abstract709)      PDF(pc) (755KB)(586)       Save
    The rapid development of e-commerce forces the continuous transformation and upgrading of the logistics industry. In view of the fact that local governments encourage the development of public transport and advocate green and low-carbon logistics distribution mode, a distribution mode of bus-assisted drone is studied. After explaining the problem, a mathematical model with the lowest distribution cost is constructed, and a heuristic algorithm of smart general variable neighborhood search metaheuristic is designed to solve the problem. At the same time, in order to improve the efficiency of the algorithm, K-means clustering and greedy algorithm are introduced to generate the initial solution. Firstly, aiming at different scale examples, a variety of local search strategies and a variety of algorithms are compared to verify the effectiveness of the algorithm. Secondly, by selecting the standard CVRP as example, the single truck distribution mode and truck-drone collaborative distribution mode are compared with the distribution mode of bus-assisted drone to prove its cost and time advantages. Finally, Beijing Bus Rapid Transit Line 2 and its surrounding customer points are selected, and sensitivity analysis is made by changing the bus stop spacing and departure interval, result shows that the impact of increasing the stop spacing is greater than the change of departure interval.
    Reference | Related Articles | Metrics
    Review on Human Action Recognition Methods Based on Multimodal Data
    WANG Cailing, YAN Jingjing, ZHANG Zhidong
    Computer Engineering and Applications    2024, 60 (9): 1-18.   DOI: 10.3778/j.issn.1002-8331.2310-0090
    Abstract299)      PDF(pc) (8541KB)(429)       Save
    Human action recognition (HAR) is widely applied in the fields of intelligent security, autonomous driving and human-computer interaction. With advances in capture equipment and sensor technology, the data that can be acquired for HAR is no longer limited to RGB data, but also multimodal data such as depth, skeleton, and infrared data. Feature extraction methods in HAR based on RGB and skeleton data modalities are introduced in detail, including handcrafted-based and deep learning-based methods. For RGB data modalities, feature extraction algorithms based on two-stream convolutional neural network (2s-CNN), 3D convolutional neural network (3DCNN) and hybrid network are analyzed. For skeleton data modalities, some popular pose estimation algorithms for single and multi-person are firstly introduced. The classification algorithms based on convolutional neural network (CNN), recurrent neural network (RNN), and graph convolutional neural network (GCN) are analyzed stressfully. A further comprehensive demonstration of the common datasets for both data modalities is presented. In addition, the current challenges are explored based on the corresponding data structure features of RGB and skeleton. Finally, future research directions for deep learning-based HAR methods are discussed.
    Reference | Related Articles | Metrics
    Small Sample Steel Plate Defect Detection Algorithm of Lightweight YOLOv8
    DOU Zhi, GAO Haoran, LIU Guoqi, CHANG Baofang
    Computer Engineering and Applications    2024, 60 (9): 90-100.   DOI: 10.3778/j.issn.1002-8331.2311-0070
    Abstract384)      PDF(pc) (5010KB)(424)       Save
    The surface area of steel plate is large, and the surface defects are very common, and showing the characteristics of multi-class and small amount. Deep learning is difficult to be effectively applied to the detection of such small sample defects. In order to solve this problem, a small sample steel plate defect detection algorithm based on lightweight YOLOv8 is proposed. Firstly, an interactive data augmentation algorithm based on fuzzy search is proposed, which can effectively solve the problem that the network model cannot be effectively trained due to the lack of training samples, making it possible for deep learning to be applied in this field. Then, the LMRNet (lightweight multi-scale residual networks) network is designed to replace the backbone of YOLOv8, to achieve the lightweight of the network model and improve its portability. Finally, the CBFPN (context bidirectional feature pyramid network) and ECSA (efficient channel spatial attention) modules are proposed to make the network more effective in extracting and fusing scar features, and the Wise-IoU loss function is adopted to improve the detection performance. The comparative experimental results show that compared with the original YOLOv8 algorithm, the amount of parameters of the improved network is only 30% of the original network, the amount of calculation is 49% of the original network, the FPS is increased by 9 frame/s. The accuracy rate, recall rate and mAP have increased by 2.9, 6.5 and 5.5 percentage points respectively. Experimental results fully verify the advantages of the proposed algorithm.
    Reference | Related Articles | Metrics
    Algorithm for Real-Time Vehicle Detection from UAVs Based on Optimizing and Improving YOLOv8
    SHI Tao, CUI Jie, LI Song
    Computer Engineering and Applications    2024, 60 (9): 79-89.   DOI: 10.3778/j.issn.1002-8331.2312-0291
    Abstract346)      PDF(pc) (4614KB)(414)       Save
    To address the problems of low accuracy, easy interference from background environment and difficulty in detecting small target vehicles of existing UAV vehicle detection algorithms, an improved UAV vehicle detection algorithm YOLOv8-CX is proposed based on YOLOv8. By integrating the advantages of Deformable Convolutional Networks v1-3, a C2f-DCN module is proposed to flexibly sample features and better extract features between vehicles of different sizes. Utilizing the idea of large separable kernel attention, a SPPF-LSKA module is proposed with long-range dependency and self-adaptability, which can effectively reduce background interference on vehicle detection. In the neck network, a CF-FPN (ment network for tiny object deteciton) feature fusion structure is adopted to enhance the detection accuracy of small targets by combining contextual information and suppressing conflicts between features at different scales. Finally, the original YOLOv8 head is replaced with a Dynamic Head detection head. By unifying scale, space and task, the three types of attention mechanisms, the model detection performance is further improved. Experimental results show that on the Mapsai dataset, compared with the original algorithm, the improved algorithm increases the accuracy (P), recall (R) and mean average precision (mAP) by 8.5, 11.2 and 6.2 percentage points respectively, and the algorithm detection speed reaches 72.6 FPS, meeting the real-time requirements of UAV vehicle detection. By comparing with other mainstream target detection algorithms, the effectiveness and superiority of this method are validated.
    Reference | Related Articles | Metrics
    Baggage Tracking Technology Based on Improved YOLO v8
    CAO Chao, GU Xingsheng
    Computer Engineering and Applications    2024, 60 (9): 151-158.   DOI: 10.3778/j.issn.1002-8331.2310-0238
    Abstract295)      PDF(pc) (6479KB)(407)       Save
    In the airport baggage sorting scenario, the traditional multi-target tracking algorithm has the problems of high target ID switching rate and high false alarm rate of target trajectory. This paper presents a baggage tracking technique based on improved YOLO v8 and ByteTrack algorithms. The CBATM module is added, the ADH decoupling head is replaced and the loss function during training is changed, the detection accuracy is increased, the discrimination of target features is strengthened, and the ID switching rate of the target is reduced. GSI interpolation processing in Byte data association, which not only uses high box and low box, but also ensures the tracking effect after a long time of occlusion, and reduces the ID error switching caused by occlusion. In the airport baggage sorting dataset, MOTA and IDF 1 reach 89.9% and 90.3%, respectively, which show a significant improvement and can steadily realize the tracking of luggage ID.
    Reference | Related Articles | Metrics
    Research Review on Deep Reinforcement Learning for Solving End-to-End Navigation Problems of Mobile Robots
    HE Li, YAO Jiacheng, LIAO Yuxin, ZHANG Wenzhi, LU Zhaoqing, YUAN Liang, XIAO Wendong
    Computer Engineering and Applications    2024, 60 (14): 1-13.   DOI: 10.3778/j.issn.1002-8331.2312-0256
    Abstract197)      PDF(pc) (4646KB)(351)       Save
    Autonomous navigation is the prerequisite and foundation for mobile robots to accomplish complex tasks. Traditional autonomous navigation systems rely on the accuracy of maps and cannot adapt to highly complex industrial and service scenarios. End-to-end navigation methods for mobile robots that do not rely on a priori map information and are able to make autonomous decisions through deep reinforcement learning, and environment interaction learning have become a new research hotspot. Most existing classifications cannot comprehensively summarize the challenges and opportunities of end-to-end navigation problems. Based on the characteristics of end-to-end navigation systems, the challenges of the navigation problem are attributed to the key issues of poor perception ability of navigation agents, ineffective learning and poor generalization ability of navigation strategies. The research status and development trends of end-to-end navigation systems are described. Representative research results in recent years addressing these key issues are detailed respectively, and their advantages and shortcomings are summarized. Finally, the future development trends of end-to-end navigation for mobile robots are prospectively envisioned in aspects such as visual language navigation, multi-agents collaborative navigation, end-to-end navigation for fusion super-resolution reconstructed images and interpretable end-to-end navigation, providing certain insights for the research and application of end-to-end navigation for mobile robots.
    Reference | Related Articles | Metrics
    Improved YOLOv8 Lightweight UAV Target Detection Algorithm
    HU Junfeng, LI Baicong, ZHU Hao, HUANG Xiaowen
    Computer Engineering and Applications    2024, 60 (8): 182-191.   DOI: 10.3778/j.issn.1002-8331.2310-0063
    Abstract267)      PDF(pc) (813KB)(349)       Save
    Aiming at the problem that UAV target detection algorithms are computationally complex and difficult to deploy, and the long-tailed distribution of UAV data leads to low detection accuracy, a lightweight UAV target detection algorithm based on improved YOLOv8 (PC-YOLOv8-n) is proposed, which can balance the network detection accuracy and computation, and has some generalisation ability to long-tailed distribution of data. Using partial convolutional layers (PConv) to replace the 3×3 convolutional layers in YOLOv8, the network is lightweighted to solve the problems of network redundancy and computational complexity; it fuses dual-channel feature pyramids, increases top-down paths, fusion of deep and shallow information, and introduces a lightweight attention mechanism in the same layer to improve the feature extraction ability of the network; it uses the equilibrium focus loss (EFL) as the category loss function to increase the category detection ability of the network by equalising the gradient weights of the tail categories during network training. The experimental results show that PC-YOLOv8-n has good performance in the VisDrone2019 dataset, improving 1.6 percentage points in mAP50 accuracy over the original YOLOv8-n algorithm, while the parameters and computation of the model are reduced to 2.6×106 and 7.6 GFLOPs, respectively, and the detection speed reaches 77.2 FPS.
    Reference | Related Articles | Metrics
    Survey of Few-Shot Image Classification Based on Deep Meta-Learning
    ZHOU Bojun, CHEN Zhiyu
    Computer Engineering and Applications    2024, 60 (8): 1-15.   DOI: 10.3778/j.issn.1002-8331.2308-0271
    Abstract291)      PDF(pc) (1091KB)(343)       Save
    Deep meta-learning has emerged as a popular paradigm for addressing few-shot classification problems. A comprehensive review of recent advancements in few-shot image classification algorithms based on deep meta-learning is provided. Starting from the problem description, the categorizes of the algorithms based on deep meta-learning for few-shot image classification are summarized, and commonly used few-shot image classification datasets and evaluation criteria are introduced. Subsequently, typical models and the latest research progress are elaborated in three aspects: model-based deep meta-learning methods, optimization-based deep meta-learning methods, and metric-based deep meta-learning methods. Finally, the performance analysis of existing algorithms on popular public datasets is presented, the research hotspots in this topic are summarized, and its future research directions are discussed.
    Reference | Related Articles | Metrics
    Research Progress on Designing Lightweight Deep Convolutional Neural Networks
    ZHOU Zhifei, LI Hua, FENG Yixiong, LU Jianguang, QIAN Songrong, LI Shaobo
    Computer Engineering and Applications    2024, 60 (22): 1-17.   DOI: 10.3778/j.issn.1002-8331.2404-0372
    Abstract266)      PDF(pc) (6330KB)(336)       Save
    Lightweight design is a popular paradigm to address the dependence of deep convolutional neural network (DCNN) on device performance and hardware resources, and the purpose of lightweighting is to increase the computational speed and reduce the memory footprint without sacrificing the network performance. An overview of lightweight design approaches for DCNNs is presented, focusing on a review of the research progress in recent years, including two major lightweighting strategies, namely, system design and model compression, as well as an in-depth comparison of the innovativeness, strengths and limitations of these two types of approaches, and an exploration of the underlying framework that supports the lightweighting model. In addition, scenarios in which lightweight networks have been successfully applied are described, and predictions are made for the future development trend of DCNN lightweighting, aiming to provide useful insights and references for the research on lightweight deep convolutional neural networks.
    Reference | Related Articles | Metrics
    Improved YOLOv8 Object Detection Algorithm for Traffic Sign Target
    TIAN Peng, MAO Li
    Computer Engineering and Applications    2024, 60 (8): 202-212.   DOI: 10.3778/j.issn.1002-8331.2309-0415
    Abstract396)      PDF(pc) (937KB)(334)       Save
    Although the current testing technology is becoming increasingly mature, the detection of small targets in complex environments is still the most difficult point in research. Aiming at the problem of high target proportion of traffic signs in road traffic scenarios, the problem of high target proportion of small targets and large environmental interference factors, it proposes a type of road traffic logo target test algorithm based on YOLOv8 improvement. Due to the prone to missed inspection in small target testing, the bi-level routing attention (BRA) attention mechanism is used to improve the network’s perception of small targets. In addition, it also uses a shape-changing convolutional module deformable convolution V3 (DCNV3). It has a better feature extraction ability for irregular shapes in the feature map, so that the backbone network can better adapt to irregular space structures, and pay more accurately to important attention,objectives, thereby improving the detection ability of the model to block the overlapping target. Both DCNV3 and BRA modules improve the accuracy of the model without increasing the weight of the model. At the same time, the Inner-IOU loss function based on auxiliary border is introduced. On the four data sets of RoadSign, CCTSDB, TSDD, and GTSDB, small sample training, large sample training, single target detection, and multi-target detection are performed. The experimental results are improved. Among them, the experiments on the RoadSign data set are the best. The average accuracy of the improved YOLOv8 model mAP50 and mAP50:95 reaches 90.7% and 75.1%, respectively. Compared with the baseline model, mAP50 and mAP50:95 have increased by 5.9 and 4.8 percentage points, respectively. The experimental results show that the improved YOLOV8 model effectively implements the traffic symbol detection in complex road scenarios.
    Reference | Related Articles | Metrics
    Research Progress on Recommendation Algorithms with Knowledge Graph Visualization Analysis
    LIN Suqing, LUO Dingnan, ZHANG Shuhua
    Computer Engineering and Applications    2024, 60 (21): 1-17.   DOI: 10.3778/j.issn.1002-8331.2312-0032
    Abstract259)      PDF(pc) (1215KB)(317)       Save
    The application and proliferation of internet technology has caused an exponential growth in data, enhancing the complexity of information retrieval from massive datasets. Recommendation algorithms have attracted significant attention for alleviating information overload, with relevant research findings continually emerging. 4?773 Chinese and 4?531 English publications from 2012 to 2024 have been sourced from China National Knowledge Infrastructure (CNKI) and the Web of Science (WOS) core collection. Visualization tools CiteSpace and VOSviewer have been utilized to generate basic information and keyword co-occurrence graphs for literatures. Core technology keywords, including knowledge graph, graph neural network, and deep learning, have been extracted through graph analysis, and the corresponding representative recommendation algorithms have been selected. The core mechanisms and the underlying principles of the algorithms have been visually presented through charts, focusing on the limitations and challenges of existing research, as well as targeted solutions. Knowledge architecture diagrams have been developed for the algorithms associated with each core technology keyword, following the challenge-solution-source literature framework. The visualization of recommendation principles has been effectively implemented.
    Reference | Related Articles | Metrics
    Review of Development of Deep Learning Optimizer
    CHANG Xilong, LIANG Kun, LI Wentao
    Computer Engineering and Applications    2024, 60 (7): 1-12.   DOI: 10.3778/j.issn.1002-8331.2307-0370
    Abstract275)      PDF(pc) (1327KB)(304)       Save
    Optimization algorithms are the most critical  factor in improving the performance of deep learning models, achieved by minimizing the loss function. Large language models (LLMs), such as GPT, have become the research focus in the field of natural language processing, the optimization effect of traditional gradient descent algorithm has been limited. Therefore, adaptive moment estimation algorithms have emerged, which are significantly superior to traditional optimization algorithms in generalization ability. Based on gradient descent, adaptive gradient, and adaptive moment estimation algorithms, and the pros  and cons of optimization algorithms are analyzed. This paper applies optimization algorithms to the Transformer architecture and selects the French-English translation task as the evaluation benchmark. Experiments have shown that adaptive moment estimation algorithms can effectively improve the performance of the model in machine translation tasks. Meanwhile, it discusses the development direction and applications of optimization algorithms.
    Reference | Related Articles | Metrics
    Survey on Video-Text Cross-Modal Retrieval
    CHEN Lei, XI Yimeng, LIU Libo
    Computer Engineering and Applications    2024, 60 (4): 1-20.   DOI: 10.3778/j.issn.1002-8331.2306-0382
    Abstract326)      PDF(pc) (3662KB)(295)       Save
    Modalities define the specific forms in which data exist. The swift expansion of various modal data types has brought multimodal learning into the limelight. As a crucial subset of this field, cross-modal retrieval has achieved noteworthy advancements, particularly in integrating images and text. However, videos, as opposed to images, encapsulate a richer array of modal data and offer a more extensive spectrum of information. This richness aligns well with the growing user demand for comprehensive and adaptable information retrieval solutions. Consequently, video-text cross-modal retrieval has emerged as a burgeoning area of research in recent times. To thoroughly comprehend video-text cross-modal retrieval and its state-of-the-art developments, a methodical review and summarization of the existing representative methods is conducted. Initially, the focus is on analyzing current deep learning-based unidirectional and bidirectional video-text cross-modal retrieval methods. This analysis includes an in-depth exploration of seminal works within each category, highlighting their strengths and weaknesses. Subsequently, the discussion shifts to an experimental viewpoint, introducing benchmark datasets and evaluation metrics specific to video-text cross-modal retrieval. The performance of several standard methods in benchmark datasets is compared. Finally, the application prospects and future research challenges of video- text cross-modal retrieval are discussed.
    Reference | Related Articles | Metrics
    Vehicle Detection Algorithm Based on Improved YOLOv8 in Traffic Surveillance
    ZHOU Fei, GUO Dudu, WANG Yang, WANG Qingqing, QIN Yin, YANG Zhuomin, HE Haijun
    Computer Engineering and Applications    2024, 60 (6): 110-120.   DOI: 10.3778/j.issn.1002-8331.2310-0101
    Abstract277)      PDF(pc) (817KB)(287)       Save
    To address the current problems of insufficient vehicle detection accuracy and slow detection speed in complex traffic monitoring scenarios, a lightweight vehicle detection algorithm based on YOLOv8 model is proposed. Firstly, FasterNet is used to replace the backbone feature extraction network of YOLOv8, which reduces redundant computation and memory access, and improves the detection accuracy and inference speed of the model.Secondly, the SimAM attention module is added to the Backbone and Neck sections, which enhances the important features of the target vehicles without increasing the original network parameters, and improves the feature fusion capability. Then, to address the problem of poor detection of small-sized vehicles under dense traffic flow, a small target detection head is added to better capture the features and contextual information of small-sized vehicles. Finally, Wise-IoU, which can adaptively adjust the weight coefficients, is used as the loss function of the improved model, which enhances the regression performance of the bounding box and the robustness of the detection.The experimental results on the UA-DETRAC dataset show that compared with the original model, the improved method in this paper is able to achieve better detection accuracy and speed in the traffic monitoring system, with the mAP and FPS improved by 3.06 percengtage points and 3.36%, respectively, which effectively improves the problem of the poor detection of small-target vehicles in the complex traffic scenarios, and achieves a good balance between accuracy and speed.
    Reference | Related Articles | Metrics
    Improved YOLOv8 Small Target Detection Algorithm in Aerial Images
    FU Jinyi, ZHANG Zijia, SUN Wei, ZOU Kaixin
    Computer Engineering and Applications    2024, 60 (6): 100-109.   DOI: 10.3778/j.issn.1002-8331.2311-0281
    Abstract317)      PDF(pc) (771KB)(277)       Save
    In aerial image detection task, object and the overall image size are small, scales have different characteristics and detail information is not clear, it can cause leak and mistakenly identified problems, an improved small target detection algorithm CA-YOLOv8 is proposed. Channel feature partial convolution (CFPConv) is designed. Based on this, it reconstructs a Bottleneck structure in C2f, which is named CFP_C2f. In this way, some C2f modules in YOLOv8 head and neck are replaced, the effective channel feature weights are enhanced, and the ability to obtain multi-scale detail features is improved. A context aggregated module (CAM) is embedded to improve the context aggregation ability, optimize the response of feature channels, and strengthen the ability to perceive the details of deep features. The NWD loss function is added and combined with CIoU as a positioning regression loss function to reduce the sensitivity of position bias. By making full use of the advantages of multiple attention mechanism, the original detection head is replaced with DyHead (dynamic head). In the experiment of VisDrone2019 dataset, the improved algorithm reduces the number of parameters by 33.3% compared with the original YOLOv8s model, and the detection accuracy of mAP50 and mAP50:95 increases by 8.7 and 5.7 percentage points respectively, showing good performance and confirming its effectiveness.
    Reference | Related Articles | Metrics
    Review of Research on Artificial Intelligence in Traditional Chinese Medicine Diagnosis and Treatment
    SU Youli, HU Xuanyu, MA Shijie, ZHANG Yuning, Abudukelimu Abulizi, Halidanmu Abudukelimu
    Computer Engineering and Applications    2024, 60 (16): 1-18.   DOI: 10.3778/j.issn.1002-8331.2312-0400
    Abstract268)      PDF(pc) (6171KB)(277)       Save
    The field of traditional Chinese medicine (TCM) diagnosis and treatment is gradually moving towards standardization, objectification, modernization, and intelligence. In this process, the integration of artificial intelligence (AI) has greatly propelled the advancement of TCM diagnosis and treatment, scientific research, and TCM inheritance. The review starts from the current research status of AI in TCM, combs through the application and development of AI in TCM in three stages from expert system and rule engines, traditional machine learning algorithm to deep learning, and then summarizes the knowledge management tools and large language models of TCM in recent years. Finally, this paper analyzes the multiple challenges of data fairness, multimodal data understanding, model robustness, personalized medicine, and interpretability that exist at this stage of AI in TCM. To address these challenges, it is necessary to continuously explore and propose possible solutions to promote the in-depth development of intelligent TCM diagnosis and treatment, thus better meeting the health needs of people.
    Reference | Related Articles | Metrics
    Survey on Attack Methods and Defense Mechanisms in Federated Learning
    ZHANG Shiwen, CHEN Shuang, LIANG Wei, LI Renfa
    Computer Engineering and Applications    2024, 60 (5): 1-16.   DOI: 10.3778/j.issn.1002-8331.2306-0243
    Abstract252)      PDF(pc) (792KB)(271)       Save
    The attack and defense techniques of federated learning are the core issue of federated learning system security. The attack and defense techniques of federated learning can significantly reduce the risk of being attacked and greatly enhance the security of federated learning systems. Deeply understanding the attack and defense techniques of federated learning can advance research in the field and achieve its widespread application of federated learning. Therefore, it is of great significance to study the attack and defense techniques of federated learning. Firstly, this paper briefly introduces the concept, basic workflow, types, and potential existing security issues of federated learning. Subsequently, the paper introduces the attacks that the federated learning system may encounter, and relevant research is summarized during the introduction. Then, starting from whether the federated learning system has targeted defense measures, the defense measures are divided into two categories:universal defense measures and targeted defense measures, and targeted summary are made. Finally, it reviews and analyzes the future research directions for the security of federated learning, providing reference for relevant researchers in their research work on the security of federated learning.
    Reference | Related Articles | Metrics
    Review of Medical Image Segmentation Algorithms Based on U-Net Variants
    CUI Ke, TIAN Qichuan, LIAN Lu
    Computer Engineering and Applications    2024, 60 (11): 32-49.   DOI: 10.3778/j.issn.1002-8331.2310-0335
    Abstract179)      PDF(pc) (6802KB)(267)       Save
    The simple and efficient network structure of U-Net is widely used in medical image segmentation, and many scholars have made various researches on the U-Net structure. This paper elucidates in the following: firstly, the paper summarizes the key challenges of the U-Net network in the field of medical image segmentation; next, it elaborates the formats and characteristics of medical image datasets that are commonly used in the U-Net network; then, it summarizes the six improvement mechanism of U-Net:skip connection mechanism, generative adversarial network, residual connection mechanism, 3D-UNet, Transformer mechanism, and dense connecting mechanism. Finally, the paper discusses the relationship between these improvement mechanisms and commonly used medical data formats, and points out the ideas and directions for future improvement, so as to stimulate the unlimited potential of U-Net in medical image segmentation.
    Reference | Related Articles | Metrics
    Process of Weakly Supervised Salient Object Detection
    YU Junwei, GUO Yuansen, ZHANG Zihao, MU Yashuang
    Computer Engineering and Applications    2024, 60 (10): 1-15.   DOI: 10.3778/j.issn.1002-8331.2308-0206
    Abstract194)      PDF(pc) (6029KB)(264)       Save
    Salient object detection aims to accurately detect and locate the most attention-grabbing objects or regions in images or videos, facilitating better object recognition and scene analysis. Despite the effectiveness of fully supervised saliency detection methods, acquiring large pixel-level annotated datasets is challenging and costly. Weakly supervised detection methods utilize relatively easy-to-obtain image-level labels or noisy weak labels to train models, demonstrating good performance in practical applications. This paper comprehensively compares the mainstream methods and application scenarios of fully supervised and weakly supervised saliency detection methods, and then analyzes the data annotation methods using weak labels and their impact on salient object detection. The latest research progress in salient object detection under weakly supervised conditions is reviewed, and the performance of various weakly supervised methods is compared on several public datasets. Finally, the potential applications of weakly supervised saliency detection methods in special fields such as agriculture, medicine and military are discussed, highlighting the existing challenges and future development trends in this research area.
    Reference | Related Articles | Metrics
    Research Advance of Crack Detection for Infrastructure Surfaces Based on Deep Learning
    HU Xiangkun, LI Hua, FENG Yixiong, QIAN Songrong, LI Jian, LI Shaobo
    Computer Engineering and Applications    2025, 61 (1): 1-23.   DOI: 10.3778/j.issn.1002-8331.2407-0407
    Abstract222)      PDF(pc) (9136KB)(263)       Save
    Civil infrastructure is prone to changes in physical or performance after long-term use, and causing certain damage to the function and service safety. So it is essential to monitor structure healthy of such facilities. Crack detection is an extremely important part of structure healthy monitoring. Timely detection and identification of such damage can effectively avoid severe accidents. Crack detection methods based on computer vision are simple, fast and accurate, and are widely used for surface crack detection in civil infrastructures. This paper reviews crack detection methods for infrastructure surfaces based on deep learning from three different detection directions: image classification, object detection, and semantic segmentation. And common data collection methods and commonly used public crack datasets are summarized. Finally, the difficulties and challenges of deep learning-based surface crack detection methods for infrastructures are discussed, and possible future development directions are envisioned.
    Reference | Related Articles | Metrics
    Research Progress of Image Style Transfer Based on Neural Network
    LIAN Lu, TIAN Qichuan, TAN Run, ZHANG Xiaohang
    Computer Engineering and Applications    2024, 60 (9): 30-47.   DOI: 10.3778/j.issn.1002-8331.2309-0204
    Abstract219)      PDF(pc) (7029KB)(256)       Save
    Image style transfer is the process of remapping the content of a specified image with a style image, which is a research hotspot in the field of artificial intelligence computer vision. Traditional image style transfer methods are mainly based on the synthesis of physical and texture techniques, and the style transfer effect is rough and less robust. With the emergence of image datasets and the proposal of various deep learning model networks, many models and algorithms for image style transfer have emerged. This paper analyzes the current status of image style transfer research, combs the development of image style transfer and the latest research progress, and gives the future research directions of image style transfer through comparative analysis.
    Reference | Related Articles | Metrics
    Review of Connected Autonomous Vehicle Cooperative Control at On-Ramp Merging Areas
    LI Chun, WU Zhizhou, ZENG Guang, ZHAO Xin, YANG Zhidan
    Computer Engineering and Applications    2024, 60 (12): 1-17.   DOI: 10.3778/j.issn.1002-8331.2310-0310
    Abstract166)      PDF(pc) (5963KB)(255)       Save
    The area where vehicles conduct interchanges is designated as the on-ramp merging area. The traffic efficiency in the ramp merging area drastically decreases if the mainline and ramp traffic flow density reaches saturation. As a current research hotspot in transportation, intelligent network technology, relying on the high-precision motion control and high-efficiency communication of connected-automated vehicle (CAV), can significantly improve the traffic efficiency in the merging area. The fusion strategies used by CAV are assessed in this research utilizing three different control paradigms: feedback control, optimal control, and reinforcement learning. The shortcomings of the three methods in this scenario are summarized, and specific improvement measures are given by reviewing existing research. Also, it offers a thorough summary of the most recent developments and trends in this particular scientific field.
    Reference | Related Articles | Metrics
    Review of Unsupervised Domain Adaptation in Medical Image Segmentation
    HU Wei, XU Qiaozhi, GE Xiangwei, YU Lei
    Computer Engineering and Applications    2024, 60 (6): 10-26.   DOI: 10.3778/j.issn.1002-8331.2307-0421
    Abstract267)      PDF(pc) (756KB)(249)       Save
    Medical image segmentation has broad application prospects in the field of medical image processing, providing auxiliary information for diagnosis and treatment by locating and segmenting interested organs, tissues, or lesion areas. However, there is a domain offset problem between different modalities of medical images, which can lead to a significant decrease in the performance of the segmentation model during actual deployment. Domain adaptation technology is an effective way to solve this problem, especially unsupervised domain adaptation, which has become a research hotspot in the field of medical image processing because it does not require target domain label information. At present, there are relatively few review reports on unsupervised domain adaptation research in medical image segmentation. Therefore, this paper summarizes, analyzes, and prospects the future of unsupervised domain adaptation research in medical image segmentation in recent years, hoping to help relevant researchers quickly understand and familiarize themselves with the current research status and trends in this field.
    Reference | Related Articles | Metrics
    Small Object Detection Algorithm Based on ATO-YOLO
    SU Jia, QIN Yichang, JIA Ze, WANG Jing
    Computer Engineering and Applications    2024, 60 (6): 68-77.   DOI: 10.3778/j.issn.1002-8331.2308-0385
    Abstract263)      PDF(pc) (795KB)(243)       Save
    Small object detection is of great significance in the field of computer vision. However, existing methods often suffer from issues such as missed detection and false alarms when dealing with challenges like scale variation, dense object arrangement, and irregular layouts. To address these problems, ATO-YOLO, an improved version of the YOLOv5 algorithm is proposed. Firstly, this paper introduces an adaptive feature extraction (AFE) module that incorporates an attention mechanism to enhance the feature representation capability of the detection model. By dynamically adjusting the weight allocation to highlight key object features, AFE improves the accuracy and robustness of object detection tasks in various scenarios. Secondly, a triple feature fusion (TFF) mechanism is designed to effectively utilize multi-scale information by fusing feature maps from different scales, resulting in more comprehensive object features and enhanced detection performance for small objects. Lastly, an output reconstruction (ORS) module is introduced, which removes the large object detection layer and adds a small object detection layer, enabling precise localization and recognition of small objects. This module also reduces model complexity and improves detection speed compared to the original model. Experimental results demonstrate that the ATO-YOLO algorithm achieves an mAP@0.5 of 38.2% on the VisDrone dataset, a 6.1?percentage points improvement over YOLOv5, with a relative FPS increase of 4.4%. This algorithm enables fast and accurate detection of small objects.
    Reference | Related Articles | Metrics
    Review of Text Classification Methods Based on Graph Neural Networks
    SU Yilei, LI Weijun, LIU Xueyang, DING Jianping, LIU Shixia, LI Haonan, LI Guanfeng
    Computer Engineering and Applications    2024, 60 (19): 1-17.   DOI: 10.3778/j.issn.1002-8331.2403-0142
    Abstract206)      PDF(pc) (3425KB)(241)       Save
    Text classification is an important task in the field of natural language processing, aiming to assign given text data to a predefined set of categories. Traditional text classification methods can only handle data in Euclidean space and cannot process non-Euclidean data such as graphs. For text data with graph structure, it is not directly processable and cannot capture the non-Euclidean structure in the graph. Therefore, how to apply graph neural networks to text classification tasks is one of the current research hotspots. This paper reviews the text classification methods based on graph neural networks. Firstly, it outlines the traditional text classification methods based on machine learning and deep learning, and summarizes the background and principles of graph convolutional neural networks. Secondly, it elaborates on the text classification methods based on graph neural networks according to different types of graph networks, and conducts an in-depth analysis of the application of graph neural network models in text classification. Then, it compares the current text classification models based on graph neural networks through comparative experiments and discusses the classification performance of the models. Finally, it proposes future research directions to further promote the development of this field.
    Reference | Related Articles | Metrics
    Review of Application of Visual Foundation Model SAM in Medical Image Segmentation
    SUN Xing, CAI Xiaohong, LI Ming, ZHANG Shuai, MA Jingang
    Computer Engineering and Applications    2024, 60 (17): 1-16.   DOI: 10.3778/j.issn.1002-8331.2401-0136
    Abstract252)      PDF(pc) (7912KB)(238)       Save
    With the continuous development of foundation models technology, visual foundation model represented by the segment anything model (SAM) has made significant breakthroughs in the field of image segmentation. SAM, driven by prompts, accomplishes a series of downstream segmentation tasks, aiming to address all image segmentation issues comprehensively. Therefore, the application of SAM in medical image segmentation is of great significance, as its generalization performance can adapt to various medical images, providing healthcare professionals with a more comprehensive understanding of anatomical structures and pathological information. This paper introduces commonly used datasets for image segmentation, provides detailed explanations of SAM’s network architecture and generalization capabilities. It focuses on a thorough analysis of SAM’s application in five major categories of medical images: whole-slide imaging, magnetic resonance imaging, computed tomography, ultrasound, and multimodal images. The review summarizes the strengths and weaknesses of SAM, along with corresponding improvement methods. Combining current challenges in the field of medical image segmentation, the paper discusses and anticipates future directions for SAM’s development.
    Reference | Related Articles | Metrics
    Lightweight Foggy Weather Object Detection Method Based on YOLOv5
    LAI Jing’an, CHEN Ziqiang, SUN Zongwei, PEI Qingqi
    Computer Engineering and Applications    2024, 60 (6): 78-88.   DOI: 10.3778/j.issn.1002-8331.2308-0029
    Abstract248)      PDF(pc) (1220KB)(236)       Save
    Aiming at the low accuracy and high model complexity of object detection algorithms in foggy scenes, a lightweight foggy object detection method based on YOLOv5 is proposed. Firstly, this paper adopts the receptive field attention module (RFAblock) to add an attention mechanism to the receptive field by interacting with the receptive field feature information to improve the feature extraction ability. Secondly, the lightweight network Slimneck is used as the neck structure to reduce the model parameters and complexity while maintaining the accuracy. The angle vector between the real frame and the predicted frame is introduced in the loss function to improve the training speed and inference accuracy. PNMS (precise non-maximum suppression) is used to improve the candidate frame selection mechanism and reduce the leakage detection rate in the case of vehicle occlusion. Finally, the experimental results are tested on the real foggy day dataset RTTS and the synthetic foggy day dataset Foggy Cityscapes, and the experimental results show that the mAP50 is improved by 4.9 and 3.5 percengtage points, respectively, compared with YOLOv5l, and the number of model parameters is only 54.6% of that of YOLOv5l.
    Reference | Related Articles | Metrics
    Improved YOLOv8 Method for Anomaly Behavior Detection with Multi-Scale Fusion and FMB
    SHI Yangyu, ZUO Jing, XIE Chengjie, ZHENG Diwen, LU Shuhua
    Computer Engineering and Applications    2024, 60 (9): 101-110.   DOI: 10.3778/j.issn.1002-8331.2401-0240
    Abstract191)      PDF(pc) (7946KB)(235)       Save
    To resolve the problems of anomaly behavior detection including multi-scale variations, miss and false detection, and complex background interference, a method is proposed by incorporating the fusion of multi-scale features and fast multi-cross block (FMB) for anomaly behavior detection. Based on YOLOv8 as the baseline network, a FMB has been designed in the backbone to enhance context information awareness and reduce network parameters. Meanwhile, a spatial-progressive convolution pooling (S-PCP) module has been proposed to achieve multi-scale information fusion, thereby reducing the issues of miss and false detection caused by scale differences and improving detection accuracy. A SimAM attention mechanism has been introduced in the neck to suppress complex background interference and improve object detection performance. And WIoU has been used to balance the penalty force on anchor boxes, enhancing the model’s generalization performance. The proposed method has been extensively validated on the UCSD-Ped1 and UCSD-Ped2 datasets, and its generalization has been tested on the OPIXray dataset. The results indicate that the proposed method with fewer parameters achieves different improvements in anomaly behavior recognition accuracy compared to many advanced detection algorithms, demonstrating an excellent detection method for pedestrian anomaly behavior.
    Reference | Related Articles | Metrics
    Review of Object Detection Based on Event Cameras
    ZHANG Yali, TIAN Qichuan, TANG Chaolin
    Computer Engineering and Applications    2024, 60 (13): 23-35.   DOI: 10.3778/j.issn.1002-8331.2312-0322
    Abstract169)      PDF(pc) (5613KB)(233)       Save
    Event cameras are imaging methods that mimic biological retinas, with high dynamics, low latency, high temporal resolution and low power consumption. It breaks through the dilemma that traditional cameras are difficult to capture objects and target recognition under high dynamic range, and the characteristics of event cameras are of experimental significance for studying the object detection problem based on event cameras. This paper first briefly describes the status, development process, advantages and challenges of event cameras, then introduces the working principle of various types of event cameras and some object detection algorithms based on event cameras, and finally explains the challenges and future trends of object detection algorithms based on event cameras, and summarizes the article.
    Reference | Related Articles | Metrics
    Survey of Deep Learning Based Approaches for Gaze Estimation
    WEN Mingqi, REN Luqian, CHEN Zhenqin, YANG Zhuo, ZHAN Yinwei
    Computer Engineering and Applications    2024, 60 (12): 18-33.   DOI: 10.3778/j.issn.1002-8331.2309-0497
    Abstract166)      PDF(pc) (6991KB)(231)       Save
    Gaze estimation is a technique for predicting the gaze position or gaze direction of the human eye and plays an important role in human-computer interaction and computer vision applications. The recent development of deep learning has revolutionized many computer vision tasks, and using deep learning for appearance-based gaze estimation has also become a hot topic. Focusing on the training process of the deep learning model, this paper analyzes state-of-the-art gaze estimation methods from four perspectives: gaze data preprocessing, gaze feature extraction, gaze learning strategies, and deep gaze model structures. In addition, the mainstream public datasets are summarized, and the performance evaluation and analysis of 2D and 3D gaze estimation methods are carried out on several popular datasets. Finally, the challenges faced by the existing gaze estimation methods are discussed, and the future development directions are prospected.
    Reference | Related Articles | Metrics
    Research and Progress on Super-Resolution Reconstruction Methods for Terahertz Images
    JIANG Yuying, JIANG Mengdie, GE Hongyi, ZHANG Yuan, LI Guangming, CHEN Xinyu, WEN Xixi, CHEN Hao
    Computer Engineering and Applications    2024, 60 (18): 1-16.   DOI: 10.3778/j.issn.1002-8331.2401-0161
    Abstract186)      PDF(pc) (6043KB)(227)       Save
    Image super resolution is an important research topic in image processing field in recent decades, aiming to reconstruct high resolution image from low resolution image. It breaks through the limitation of manufacturing process and cost of sensor and optical device, and improves image resolution from the aspect of algorithm, which is a simple, efficient and low-cost method. As an emerging technology, Terahertz (THz) technology has been widely used in many fields. Due to the influence of THz diffraction and scattering, THz images will produce image blur and unclear texture details. More and more scholars are committed to developing super-resolution reconstruction methods for THz images. Based on the research of the literature related to THz technology and super-resolution reconstruction technology in recent years, this paper elaborates the three major reconstruction methods of THz images, focuses on the introduction of deep learning-based methods, and compares the reconstruction effects, advantages and disadvantages of various algorithms. The THz image quality assessment indexes and the commonly used datasets are reviewed, and the super-resolution reconstruction technology of THz image related applications are summarized. Finally, the future development trend of THz image super-resolution reconstruction technology is discussed.
    Reference | Related Articles | Metrics
    Research on Unmanned Aerial Vehicle Swarm Resilience Assessment and Reconfiguration Technology
    WEI Chenyue, HE Ming, HAN Wei, XU Xin, GAO Hong
    Computer Engineering and Applications    2024, 60 (15): 1-10.   DOI: 10.3778/j.issn.1002-8331.2401-0452
    Abstract178)      PDF(pc) (4418KB)(223)       Save
    Unmanned aircraft vehicle (UAV) swarm is often affected by perturbing factors such as terrain, wind, snow, rain and fog, and anti-aircraft strikes in practical applications, which leads to the decline of swarm performance and mission accomplishment capability. In order to effectively assess and improve the swarm anti-disturbance capability, an in-depth study is carried out in terms of UAV swarm resilience assessment indexes and resilience reconfiguration methods. Firstly, the current research status of UAV swarm resilience assessment indicators is sorted out and analyzed. Secondly, the research on UAV swarm resilience reconstruction methods is summarized in terms of predictive reconstruction and anti-disturbance reconstruction. To address the problems of incomplete assessment indexes and the inability of swarm adaptive reconfiguration under multi-task and multi-disturbance situations, multi-dimensional resilience assessment indexes and UAV swarm phase change reconfiguration methods are proposed respectively, which further take into account the impact of coverage, energy consumption and other factors on swarm performance, realize the adaptive phase change of different types of tasks and disturbance types, and significantly improve the swarm’s ability to cope with disturbances. Finally, it concludes and looks forward to the future development trend of UAV swarm elastic reconfiguration.
    Reference | Related Articles | Metrics
    Overview of Causal Learning Techniques and Applications
    LONG Xiangfu, LI Shaobo, ZHANG Yizong, YANG Lei, LI Chuanjiang
    Computer Engineering and Applications    2024, 60 (24): 1-19.   DOI: 10.3778/j.issn.1002-8331.2405-0407
    Abstract166)      PDF(pc) (6887KB)(222)       Save
    Machine learning is the core of artificial intelligence and data science, and is widely used in education, transportation and manufacturing. With the development of machine learning and the extension of application fields, the models have revealed some problems to be solved in terms of interpretability and fairness. Causal learning (CL), as a method combining causality and machine learning techniques, can enhance the interpretability of the model and solve the problems of fairness, and its research has gradually become a hot spot in the academic world. Therefore, based on the introduction of the relevant theoretical knowledge of CL, the techniques of causal explanation, causal supervised learning, causal fairness, and causal reinforcement learning are firstly analyzed and outlined in an all-round way according to the problems that can be solved by CL. Secondly, the applications of CL in the fields of medicine, agriculture and intelligent manufacturing are summarized from multiple perspectives. Finally, some open problems and challenges of CL are summarized, and future research directions are given, aiming to promote the continuous development of CL.
    Reference | Related Articles | Metrics
    Lightweight Face Recognition Algorithm Combining Transformer and CNN
    LI Ming, DANG Qingxia
    Computer Engineering and Applications    2024, 60 (14): 96-104.   DOI: 10.3778/j.issn.1002-8331.2311-0276
    Abstract135)      PDF(pc) (3685KB)(214)       Save
    With the development of deep learning, convolutional neural networks have become the mainstream approach for face recognition (FR) by gradually expanding the receptive field through stacking convolutional layers to integrate local features. However, this approach suffers from the drawbacks of neglecting global semantic information of faces and lacking attention to important facial features, resulting in low recognition accuracy. Additionally, the stacking of a large number of parameters and layers poses challenges for deploying the network on resource-constrained devices. Therefore, a highly lightweight face recognition algorithm called gcsamTfaceNet is proposed, which combines Transformer and CNN. Firstly, a depthwise separable convolution is used to construct the backbone network in order to reduce the parameter count of the algorithm. Secondly, a channel-spatial attention mechanism is introduced to optimize the selection of features in both the channel and spatial domains, thereby improving the attention given to important facial regions. Building upon this, the Transformer module is integrated to capture the global semantic information of the feature maps, overcoming the limitations of convolutional neural networks in modeling long-range semantic dependencies and enhancing the algorithm’s ability to perceive global features. The gcsamTfaceNet, with a parameter count of only 6.5×105, is evaluated on nine validation datasets including LFW, CA-LFW, CP-LFW, CFP-FP, CFP-FF, AgeDB-30, VGG2-FP, IJB-B, and IJB-C. It achieves average accuracies of 99.67%, 95.60%, 89.32%, 93.67%, 99.65%, 96.35%, 93.36%, 89.43%, and 91.38% on these datasets, respectively. This demonstrates a good balance between parameter count and performance.
    Reference | Related Articles | Metrics
    Review of Deep Learning Approaches for Recognizing Multiple Unsafe Behaviors in Workers
    SU Chenyang, WU Wenhong, NIU Hengmao, SHI Bao, HAO Xu, WANG Jiamin, GAO Le, WANG Weitai
    Computer Engineering and Applications    2024, 60 (5): 30-46.   DOI: 10.3778/j.issn.1002-8331.2307-0168
    Abstract224)      PDF(pc) (808KB)(214)       Save
    With the development of deep learning, target detection and behavior recognition methods have made great progress in the field of worker unsafe behavior recognition, this paper systematically summarizes the relevant research work at home and abroad in recent years, elaborates the commonly used models and effects of target detection methods and behavior recognition methods, focuses on reviewing the application of the two types of methods in the recognition of unsafe behaviors and the relevant research on the combination of the two types of methods, and provides a comprehensive analysis and comparison on the advantages, limitations, recognized behavior categories and applicable scenarios of various methods are comprehensively analyzed and compared. On this basis, the optimization measures for target detection and behavior recognition in recent years are summarized, the commonly used optimization directions and means are summarized, the improvement methods successfully applied in unsafe behavior recognition are summarized, the difficulties and problems in this research field are sorted out, and the suggestions and future development trends are given, which will provide references and lessons for the research in this field.
    Reference | Related Articles | Metrics
    Review on Deep Learning-Based 2D Single-Person Pose Estimation
    SU Yanyan, QIU Zhiliang, LI Guo, LU Shenglian, CHEN Ming
    Computer Engineering and Applications    2024, 60 (21): 18-37.   DOI: 10.3778/j.issn.1002-8331.2403-0152
    Abstract108)      PDF(pc) (7680KB)(212)       Save
    Human pose estimation is a key technology in the field of computer vision that identifies human postures by detecting body keypoints. With the rapid advancement of deep learning, it has become the dominant approach in human pose estimation, achieving significant progress. This paper reviews single-person pose estimation research based on deep learning, examining the issue from four perspectives: data preprocessing, network architecture design, supervised learning methods, and post-processing techniques. It also explores new representations of keypoints and the application of Transformer models in this area. Additionally, the paper introduces common datasets and performance metrics, and delves into the current challenges and future directions in the field of single-person pose estimation.
    Reference | Related Articles | Metrics
    Survey on Automated Recognition and Extraction of TTPs
    YU Fengrui
    Computer Engineering and Applications    2024, 60 (13): 1-22.   DOI: 10.3778/j.issn.1002-8331.2309-0489
    Abstract163)      PDF(pc) (7424KB)(209)       Save
    In the ever-evolving landscape of cyber threats, tactics, techniques and procedures (TTPs) play a crucial role in understanding malicious activities, providing a fine-grained perspective on the status of cybersecurity, and comprehensively illustrating cyber attack behaviors. Despite significant research efforts in the field of automated identification and extraction of TTPs, a comprehensive systematic review is currently lacking. This paper presents an in-depth analysis of the progress in this area by employing three principal approaches:traditional natural language processing, machine learning, and large language models. The study categorizes the tasks into information extraction, text classification, and text generation, and presents a summary of the general framework for identification and extraction processes. It offers a clear scope of unstructured text and TTPs, while refining the processing and analysis procedures, as well as innovative directions for each approaches. Moreover, building upon existing research, the paper identifies current challenges and proposes future research directions and development opportunities. This comprehensive survey serves as a valuable literature review to support readers in applying advanced technologies and methods for advancing research in this field.
    Reference | Related Articles | Metrics
    Research on Gesture Recognition Based on Improved YOLOv5 and Mediapipe
    NI Guangxing, XU Hua, WANG Chao
    Computer Engineering and Applications    2024, 60 (7): 108-118.   DOI: 10.3778/j.issn.1002-8331.2308-0097
    Abstract263)      PDF(pc) (686KB)(208)       Save
    The existing gesture recognition algorithms have the problems of large amounts of calculation and poor robustness. In this paper, a gesture recognition method based on IYOLOv5-Med (improved YOLOv5 Mediapipe) algorithm is proposed. This algorithm combines the improved YOLOv5 algorithm with the Mediapipe method, including gesture detection and gesture analysis. In the part of gesture detection, the traditional YOLOv5 algorithm is improved. Firstly, the C3 module is reconstructed by FastNet. Secondly, the CBS module is replaced by the GhostConv module in GhostNet. Thirdly, the SE attention mechanism module is introduced at the end of the Backbone network. The improved algorithm has a smaller model size and is more suitable for edge devices with limited resources. In the part of gesture analysis, a method based on Mediapipe is proposed. The key points of the hand are detected in the gesture area located in the gesture detection part, and the relevant features are extracted, and then identified by the naive Bayes classifier. The experimental findings affirm the efficacy of the IYOLOv5-Med algorithm introduced in this article. When compared to the conventional YOLOv5 algorithm, the parameters are reduced by 34.5%, the computations are reduced by 34.9%, and the model weight is decreased by 33.2%. The final average recognition rate reaches 0.997, and the implementation method is relatively simple, which has a good application prospect.
    Reference | Related Articles | Metrics
    Multiview Interaction Learning Network for Multimodal Aspect-Level Sentiment Analysis
    WANG Xuyang, PANG Wenqian, ZHAO Lijie
    Computer Engineering and Applications    2024, 60 (7): 92-100.   DOI: 10.3778/j.issn.1002-8331.2210-0288
    Abstract155)      PDF(pc) (591KB)(203)       Save
    Previous multimodal aspect-level sentiment analysis methods only use the general text and picture representations of the pre-trained model, which are insensitive to recognition of aspect and opinion word correlation, and the contribution of picture information to word representation cannot be obtained dynamically, so they cannot fully recognize the correlation between multimodal and aspects. Aiming at the above problems, a multiview interaction learning network is proposed. In order to make full use of the global features of the text in multimodal interaction, extracting sentence features from context and syntax views respectively sentences are extracted. Model the relationship among text, picture and aspect to realize multimodal interaction. At the same time, the interactive representation of different modalities is fused to dynamically obtain the contribution of visual information to each word in the text, and the correlation between modalities and aspects is fully extracted. Finally, the sentiment classification results are obtained through the fully connected layer and Softmax layer. Experiments on two datasets show that this model can effectively enhance the effect of multimodal aspect-level sentiment classification.
    Reference | Related Articles | Metrics