Most Read articles

    Published in last 1 year |  In last 2 years |  In last 3 years |  All

    In last 2 years
    Please wait a minute...
    For Selected: Toggle Thumbnails
    YOLOv5 Helmet Wear Detection Method with Introduction of Attention Mechanism
    WANG Lingmin, DUAN Jun, XIN Liwei
    Computer Engineering and Applications    2022, 58 (9): 303-312.   DOI: 10.3778/j.issn.1002-8331.2112-0242
    Abstract1308)      PDF(pc) (1381KB)(735)       Save
    For high-risk industries such as steel manufacturing, coal mining and construction industries, wearing helmets during construction is one of effective ways to avoid injuries. For the current helmet wearing detection model in a complex environment for small and dense targets, there are problems such as false detection and missed detection, an improved YOLOv5 target detection method is proposed to detect the helmet wearing. A coordinate attention mechanism(coordinate attention) is added to the backbone network of YOLOv5, which embeds location information into channel attention so that the network can pay attention on a larger area. The original feature pyramid module in the feature fusion module is replaced with a weighted bi-directional feature pyramid(BiFPN)network structure to achieve efficient bi-directional cross-scale connectivity and weighted feature fusion. The experimental results on the homemade helmet dataset show that the improved YOLOv5 model achieves an average accuracy of 95.9%, which is 5.1 percentage points higher than the YOLOv5 model, and meets the requirements for small and dense target detection in complex environments.
    Reference | Related Articles | Metrics
    Research Progress on Vision System and Manipulator of Fruit Picking Robot
    GOU Yuanmin, YAN Jianwei, ZHANG Fugui, SUN Chengyu, XU Yong
    Computer Engineering and Applications    2023, 59 (9): 13-26.   DOI: 10.3778/j.issn.1002-8331.2209-0183
    Abstract1279)      PDF(pc) (787KB)(850)       Save
    Fruit picking robot is of great significance to the realization of automatic intelligence of fruit equipment. In this paper, the research work on the key technologies of fruit-picking robot at home and abroad in recent years is summarized, firstly, the key technologies of fruit-picking robot vision system, such as traditional image segmentation methods based on fruit features, such as threshold method, edge detection method, clustering algorithm based on color features and region-based image segmentation algorithm, are discussed, the object recognition algorithm based on depth learning and the target fruit location are analyzed and compared, and the state-of-the-art of fruit picking robot manipulator and end-effector is summarized, finally, the development trend and direction of fruit-picking robot in the future are prospected, which can provide reference for the related research of fruit-picking robot.
    Reference | Related Articles | Metrics
    Overview of Multi-Agent Path Finding
    LIU Zhifei, CAO Lei, LAI Jun, CHEN Xiliang, CHEN Ying
    Computer Engineering and Applications    2022, 58 (20): 43-64.   DOI: 10.3778/j.issn.1002-8331.2203-0467
    Abstract1259)      PDF(pc) (1013KB)(539)       Save
    The multi-agent path finding(MAPF) problem is the fundamental problem of planning paths for multiple agents, where the key constraint is that the agents will be able to follow these paths concurrently without colliding with each other. MAPF is widely used in logistics, military, security and other fields. MAPF algorithm can be divided into the centralized planning algorithm and the distributed execution algorithm when the main research results of MAPF at home and abroad are systematically sorted and classified according to different planning methods. The centralized programming algorithm is not only the most classical but also the most commonly used MAPF algorithm. It is mainly divided into four algorithms based on [A*] search, conflict search, cost growth tree and protocol. The other part of MAPF which is the distributed execution algorithm is based on reinforcement learning. According to different improved techniques, the distributed execution algorithm can be divided into three types:the expert demonstration, the improved communication and the task decomposition. The challenges of existing algorithms are pointed out and the future work is forecasted based on the above classification by comparing the characteristics and applicability of MAPF algorithms and analyzing the advantages and disadvantages of existing algorithms.
    Reference | Related Articles | Metrics
    Research on Object Detection Algorithm Based on Improved YOLOv5
    QIU Tianheng, WANG Ling, WANG Peng, BAI Yan’e
    Computer Engineering and Applications    2022, 58 (13): 63-73.   DOI: 10.3778/j.issn.1002-8331.2202-0093
    Abstract1173)      PDF(pc) (1109KB)(497)       Save
    YOLOv5 is an algorithm with good performance in single-stage target detection at present, but the accuracy of target boundary regression is not too high, so it is difficult to apply to scenarios with high requirements on the intersection ratio of prediction boxes. Based on YOLOv5 algorithm, this paper proposes a new model YOLO-G with low hardware requirements, fast model convergence and high accuracy of target box. Firstly, the feature pyramid network(FPN) is improved, and more features are integrated in the way of cross-level connection, which prevents the loss of shallow semantic information to a certain extent. At the same time, the depth of the pyramid is deepened, corresponding to the increase of detection layer, so that the laying interval of various anchor frames is more reasonable. Secondly, the attention mechanism of parallel mode is integrated into the network structure, which gives the same priority to spatial and channel attention module, then the attention information is extracted by weighted fusion, so that the network can fuse the mixed domain attention according to the attention degree of spatial and channel attention. Finally, in order to prevent the loss of real-time performance due to the increase of model complexity, the network is lightened to reduce the number of parameters and computation of the network. PASCAL VOC datasets of 2007 and 2012 are used to verify the effectiveness of the algorithm. Compared with YOLOv5s, YOLO-G reduces the number of parameters by 4.7% and the amount of computation by 47.9%, while mAP@0.5 and mAP@0.5:0.95 increases by 3.1 and 5.6 percentage points respectively.
    Reference | Related Articles | Metrics
    Survey of Opponent Modeling Methods and Applications in Intelligent Game Confrontation
    WEI Tingting, YUAN Weilin, LUO Junren, ZHANG Wanpeng
    Computer Engineering and Applications    2022, 58 (9): 19-29.   DOI: 10.3778/j.issn.1002-8331.2202-0297
    Abstract1068)      PDF(pc) (904KB)(411)       Save
    Intelligent game confrontation has always been the focus of artificial intelligence research. In the game confrontation environment, the actions, goals, strategies, and other related attributes of agent can be inferred by opponent modeling, which provides key information for game strategy formulation. The application of opponent modeling method in competitive games and combat simulation is promising, and the formulation of game strategy must be premised on the action strategy of all parties in the game, so it is especially important to establish an accurate model of opponent behavior to predict its intention. From three dimensions of connotation, method, and application, the necessity of opponent modeling is expounded and the existing modeling methods are classified. The prediction method based on reinforcement learning, reasoning method based on theory of mind, and optimization method based on Bayesian are summarized. Taking the sequential game(Texas Hold’em), real-time strategy game(StarCraft), and meta-game as typical application scenarios, the role of opponent modeling in intelligent game confrontation is analyzed. Finally, the development of adversary modeling technology prospects from three aspects of bounded rationality, deception strategy and interpretability.
    Reference | Related Articles | Metrics
    Research on Local Path Planning Algorithm Based on Improved TEB Algorithm
    DAI Wanyu, ZHANG Lijuan, WU Jiafeng, MA Xianghua
    Computer Engineering and Applications    2022, 58 (8): 283-288.   DOI: 10.3778/j.issn.1002-8331.2108-0290
    Abstract984)      PDF(pc) (878KB)(193)       Save
    When the traditional TEB(time elastic band) algorithm is used to plan the path in a complex dynamic environment, path vibrations caused by the unsmooth speed control amount will occur, which will bring greater impact to the robot and prone to collisions. Aiming at the above problems, the traditional TEB algorithm is improved. The detected irregular obstacles are expansion treatment and regional classification strategy, and the driving route in the safe area is given priority to make the robot run more safely and smoothly in the complex environment. Adding the obstacle distance to the speed constraint in the algorithm can effectively reduce the vibration amplitude and the impact of the robot during the path driving process caused by the speed jump after the robot approaches the obstacle, so as to ensure the safety of the robot during operation. A large number of comparative simulations in the ROS environment show that in a complex dynamic environment, the path planned by the improved TEB algorithm is safer and smoother, which can effectively reduce the impact of the robot.
    Reference | Related Articles | Metrics
    Survey of Deep Clustering Algorithm Based on Autoencoder
    TAO Wenbin, QIAN Yurong, ZHANG Yiyang, MA Hengzhi, LENG Hongyong, MA Mengnan
    Computer Engineering and Applications    2022, 58 (18): 16-25.   DOI: 10.3778/j.issn.1002-8331.2204-0049
    Abstract914)      PDF(pc) (724KB)(321)       Save
    As a common analysis method, cluster analysis is widely used in various scenarios. With the development of machine learning technology, deep clustering algorithm has also become a hot research topic, and the deep clustering algorithm based on autoencoder is one of the representative algorithms. To keep abreast of the development of deep clustering algorithms based on autoencoders, four models of autoencoders are introduced, and the representative algorithms in recent years are classified according to the structure of autoencoders. For the traditional clustering algorithm and the deep clustering algorithm based on autoencoder, experiments are compared and analyzed on the MNIST, USPS, Fashion-MNIST datasets. At last, the current problems of deep clustering algorithms based on autoencoders are summarized, and the possible research directions of deep clustering algorithms are prospected.
    Reference | Related Articles | Metrics
    Survey of Transformer-Based Object Detection Algorithms
    LI Jian, DU Jianqiang, ZHU Yanchen, GUO Yongkun
    Computer Engineering and Applications    2023, 59 (10): 48-64.   DOI: 10.3778/j.issn.1002-8331.2211-0133
    Abstract865)      PDF(pc) (875KB)(488)       Save
    Transformer is a kind of deep learning framework with strong modeling and parallel computing capabilities. At present, object detection algorithm based on Transformer has become a hotspot. In order to further explore new ideas and directions, this paper summarizes the existing object detection algorithm based on Transformer as well as a variety of object detection data sets and their application scenarios. This paper describes the correlation algorithms for Transformer based object detection from four aspects, i.e. feature extraction, object estimation, label matching policy and application of algorithm, compares the Transformer algorithm with the object detection algorithm based on convolutional neural network, analyzes the advantages and disadvantages of Transformer in object detection task, and proposes a general framework for Transformer based object detection model. Finally, the prospect of development trend of Transformer in the field of object detection is put forward.
    Reference | Related Articles | Metrics
    Review of Visual Odometry Methods Based on Deep Learning
    ZHI Henghui, YIN Chenyang, LI Huibin
    Computer Engineering and Applications    2022, 58 (20): 1-15.   DOI: 10.3778/j.issn.1002-8331.2203-0480
    Abstract817)      PDF(pc) (904KB)(438)       Save
    Visual odometry(VO) is a common method to deal with the positioning of mobile devices equipped with vision sensors, and has been widely used in autonomous driving, mobile robots, AR/VR and other fields. Compared with traditional model-based methods, deep learning-based methods can learn efficient and robust feature representations from data without explicit computation, thereby improving their ability to handle challenging scenes such as illumination changes and less textures. In this paper, it first briefly reviews the model-based visual odometry methods, and then focuses on six aspects of deep learning-based visual odometry methods, including supervised learning methods, unsupervised learning methods, model-learning fusion methods, common datasets, evaluation metrics, and comparison of models and deep learning methods. Finally, existing problems and future development trends of deep learning-based visual odometry are discussed.
    Reference | Related Articles | Metrics
    Survey on Image Semantic Segmentation in Dilemma of Few-Shot
    WEI Ting, LI Xinlei, LIU Hui
    Computer Engineering and Applications    2023, 59 (2): 1-11.   DOI: 10.3778/j.issn.1002-8331.2205-0496
    Abstract804)      PDF(pc) (4301KB)(590)       Save
    In recent years, image semantic segmentation has developed rapidly due to the emergence of large-scale datasets. However, in practical applications, it is not easy to obtain large-scale, high-quality images, and image annotation also consumes a lot of manpower and time costs. In order to get rid of the dependence on the number of samples, few-shot semantic segmentation has gradually become a research hotspot. The current few-shot semantic segmentation methods mainly use the idea of meta-learning, which can be divided into three categories:based on the siamese neural network, based on the prototype network and based on the attention mechanism according to different model structures. Based on the current research, this paper introduces the development, advantages and disadvantages of various methods for few-shot semantic segmentation, as well as common datasets and experimental designs. On this basis, the application scenarios and future development directions are summarized.
    Reference | Related Articles | Metrics
    Review of Research on Small Target Detection Based on Deep Learning
    ZHANG Yan, ZHANG Minglu, LYU Xiaoling, GUO Ce, JIANG Zhihong
    Computer Engineering and Applications    2022, 58 (15): 1-17.   DOI: 10.3778/j.issn.1002-8331.2112-0176
    Abstract755)      PDF(pc) (995KB)(421)       Save
    The task of target detection is to quickly and accurately identify and locate predefined categories of objects from an image. With the development of deep learning techniques, detection algorithms have achieved good results for large and medium targets in the industry. The performance of small target detection algorithms based on deep learning still needs further improvement and optimization due to the characteristics of small targets in images such as small size, incomplete features and large gap between them and the background. Small target detection has a wide demand in many fields such as autonomous driving, medical diagnosis and UAV navigation, so the research has high application value. Based on extensive literature research, this paper firstly defines small target detection and finds the current difficulties in small target detection. It analyzes the current research status from six research directions based on these difficulties and summarizes the advantages and disadvantages of each algorithm. It makes reasonable predictions and outlooks on the future research directions in this field by combining the literature and the development status to provide a certain basic reference for subsequent research. This paper makes a reasonable prediction and outlook on the future research direction in this field, combining the literature and the development status to provide some basic reference for subsequent research.
    Reference | Related Articles | Metrics
    Review of Deep Reinforcement Learning Model Research on Vehicle Routing Problems
    YANG Xiaoxiao, KE Lin, CHEN Zhibin
    Computer Engineering and Applications    2023, 59 (5): 1-13.   DOI: 10.3778/j.issn.1002-8331.2210-0153
    Abstract739)      PDF(pc) (1036KB)(426)       Save
    Vehicle routing problem(VRP) is a classic NP-hard problem, which is widely used in transportation, logistics and other fields. With the scale of problem and dynamic factor increasing, the traditional method of solving the VRP is challenged in computational speed and intelligence. In recent years, with the rapid development of artificial intelligence technology, in particular, the successful application of reinforcement learning in AlphaGo provides a new idea for solving routing problems. In view of this, this paper mainly summarizes the recent literature using deep reinforcement learning to solve VRP and its variants. Firstly, it reviews the relevant principles of DRL to solve VRP and sort out the key steps of DRL-based to solve VRP. Then it systematically classifies and summarizes the pointer network, graph neural network, Transformer and hybrid models four types of solving methods, meanwhile this paper also compares and analyzes the current DRL-based model performance in solving VRP and its variants. Finally, this paper sums up the challenge of DRL-based to solve VRP and future research directions.
    Reference | Related Articles | Metrics
    Overview of Smoke and Fire Detection Algorithms Based on Deep Learning
    ZHU Yuhua, SI Yiyi, LI Zhihui
    Computer Engineering and Applications    2022, 58 (23): 1-11.   DOI: 10.3778/j.issn.1002-8331.2206-0154
    Abstract734)      PDF(pc) (782KB)(410)       Save
    Among various disasters, fire is one of the main disasters that most often and universally threaten public safety and social development. With the rapid development of economic construction and the increasing size of cities, the number of major fire hazards has increased dramatically. However, the widely used smoke sensor method of fire detection is vulnerable to factors such as distance, resulting in untimely detection. The introduction of video surveillance systems has provided new ideas to solve this problem. Traditional image processing algorithms based on video are earlier proposed methods, and the recent rapid development of machine vision and image processing technologies has resulted in a series of methods using deep learning techniques to automatically detect fires in video and images, which have very important practical applications in the field of fire safety. In order to comprehensively analyze the improvements and applications related to deep learning methods for fire detection, this paper first briefly introduces the fire detection process based on deep learning, and then focuses on a detailed comparative analysis of deep methods for fire detection in three granularities:classification, detection, and segmentation, and elaborates the relevant improvements taken by each class of algorithms for existing problems. Finally, the problems of fire detection at the present stage are summarized and future research directions are proposed.
    Reference | Related Articles | Metrics
    Review of Research on Road Traffic Flow Data Prediciton Methods
    MENG Chuang, WANG Hui, LIN Hao, LI Kecen, WANG Xinpeng
    Computer Engineering and Applications    2023, 59 (14): 51-61.   DOI: 10.3778/j.issn.1002-8331.2209-0458
    Abstract729)      PDF(pc) (605KB)(319)       Save
    As an important branch of intelligent transportation system, road traffic flow prediction plays an important role in congestion prediction, path planning. The spatio-temporal polymorphism and complex correlation of road traffic flow data force the transformation and upgrading of road traffic flow prediction methods in the era of big data. In order to mine the time-space characteristics of traffic flow, scholars have proposed various methods, including model fusion, model algorithm improvement, data definition conversion, etc, in order to improve the prediction accuracy of the model. In order to reasonably summarize all kinds of traffic flow prediction methods, they are divided into three categories according to the types of methods used:statistics based methods, machine learning based methods, and depth learning based methods. This paper summarizes and analyzes the new models and algorithms in recent years by summarizing various traffic flow prediction methods, aiming to provide research ideas for relevant researchers. Finally, the methods of traffic flow prediction are summarized and prospected, and the exploration direction of the future traffic flow prediction field is given.
    Reference | Related Articles | Metrics
    Survey of Transformer Research in Computer Vision
    LI Xiang, ZHANG Tao, ZHANG Zhe, WEI Hongyang, QIAN Yurong
    Computer Engineering and Applications    2023, 59 (1): 1-14.   DOI: 10.3778/j.issn.1002-8331.2204-0207
    Abstract727)      PDF(pc) (1285KB)(469)       Save
    Transformer is a deep neural network based on self-attention mechanism. In recent years, Transformer-based models have become a hot research direction in the field of computer vision, and their structures are constantly being improved and expanded, such as local attention mechanisms, pyramid structures, and so on. Through the improved vision model based on Transformer structure, the performance optimization and structure improvement are reviewed and summarized respectively. In addition,the advantages and disadvantages of the respective structures of the Transformer and convolutional neural network(CNN) are compared and analyzed,and a new hybrid structure of CNN+Transformer is introduced. Finally,the development of Transformer in computer vision is summarized and prospected.
    Reference | Related Articles | Metrics
    Research Progress of YOLO Series Target Detection Algorithms
    WANG Linyi, BAI Jing, LI Wenjing, JIANG Jinzhe
    Computer Engineering and Applications    2023, 59 (14): 15-29.   DOI: 10.3778/j.issn.1002-8331.2301-0081
    Abstract720)      PDF(pc) (1009KB)(479)       Save
    The YOLO-based algorithm is one of the hot research directions in target detection. In recent years, with the continuous proposition of YOLO series algorithms and their improved models, the YOLO-based algorithm has achieved excellent results in the field of target detection and has been widely used in various fields in reality. This article first introduces the typical datasets and evaluation index for target detection and reviews the overall YOLO framework and the development of the target detection algorithm of YOLOv1~YOLOv7. Then, models and their performance are summarized across eight improvement directions, such as data augmentation, lightweight network construction, and IOU loss optimization, at the three stages of input, feature extraction, and prediction. Afterwards, the application fields of YOLO algorithm are introduced. Finally, combined with the actual problems of target detection, it summarizes and prospects the development direction of the YOLO-based algorithm.
    Reference | Related Articles | Metrics
    Survey of Camera Pose Estimation Methods Based on Deep Learning
    WANG Jing, JIN Yuchu, GUO Ping, HU Shaoyi
    Computer Engineering and Applications    2023, 59 (7): 1-14.   DOI: 10.3778/j.issn.1002-8331.2209-0280
    Abstract718)      PDF(pc) (702KB)(365)       Save
    Camera pose estimation is a technology to accurately estimate the 6-DOF position and pose of camera in world coordinate system under known environment. It is a key technology in robotics and automatic driving. With the rapid development of deep learning, using deep learning to optimize camera pose estimation algorithm has become one of the current research hotspots. In order to master the current research status and trends of camera pose estimation algorithms, the mainstream algorithms based on deep learning are summarized. Firstly, the traditional camera pose estimation methods based on feature points is briefly introduced. Then, the camera pose estimation method based on deep learning is mainly introduced. According to the different core algorithms, the end-to-end camera pose estimation, scene coordinate regression, camera pose estimation based on retrieval, hierarchical structure, multi-information fusion and cross scenescamera pose estimation are elaborated and analyzed in detail. Finally, this paper summarizes the current research status, points out the challenges in the field of camera pose estimation based on in-depth performance analysis, and prospects the development trend of camera pose estimation.
    Reference | Related Articles | Metrics
    Research Progress in Application of Graph Anomaly Detection in Financial Anti-Fraud
    LIU Hualing, LIU Yaxin, XU Junyi, CHEN Shanghui, QIAO Liang
    Computer Engineering and Applications    2022, 58 (22): 41-53.   DOI: 10.3778/j.issn.1002-8331.2203-0233
    Abstract652)      PDF(pc) (1848KB)(456)       Save
    With the rapid development of digital finance, fraud presents new characteristics such as intellectualization, industrialization and strong concealment. And the limitations of traditional expert rules and machine learning methods are increa-
    singly apparent. Graph anomaly detection technology has a strong ability to deal with associated information, which provides new idea for financial anti-fraud. Firstly, the development and advantages of graph anomaly detection are briefly introduced. Secondly, from the perspectives of individual anti-fraud and group anti-fraud, graph anomaly detection technology is divided into individual fraud detections based on feature, proximity, graph representation learning or community division, and gang fraud detections based on dense subgraph, dense subtensor or deep network structure. The basic idea, advantages, disadvantages, research progress and typical applications of each anomaly detection technology are analyzed and compared. Finally, the common test data sets and evaluation criteria are summarized, and the development prospect and research direction of graph anomaly detection technology in financial anti-fraud are given.
    Reference | Related Articles | Metrics
    Overview of Image Edge Detection
    XIAO Yang, ZHOU Jun
    Computer Engineering and Applications    2023, 59 (5): 40-54.   DOI: 10.3778/j.issn.1002-8331.2209-0122
    Abstract640)      PDF(pc) (921KB)(302)       Save
    The task of edge detection is to identify pixels with significant brightness changes as target edges, which is a low-level problem in computer vision, and edge detection has important applications in object recognition and detection, object proposal generation, and image segmentation. Nowadays, edge detection has produced several types of methods, such as traditional gradient-based detection methods and deep learning-based edge detection algorithms and detection methods combined with emerging technologies. A finer classification of these methods provides researchers with a clearer understanding of the trends in edge detection. Firstly, the theoretical basis and implementation methods of traditional edge detection are introduced; then the main edge detection methods in recent years are summarized and classified according to the methods used, and the core techniques used in them are introduced, such as branching structure, feature fusion and loss function. The evaluation indicators used to assess the algorithm’s performance are single-image optimal threshold(ODS) and frame per second(FPS), which are contrasted using the fundamental data set(BSDS500). Finally, the current state of edge detection research is examined and summarized, and the possible future research directions of edge detection are prospected.
    Reference | Related Articles | Metrics
    Small Object Detection Algorithm Based on Improved YOLOv5 in UAV Image
    XIE Chunhui, WU Jinming, XU Huaiyu
    Computer Engineering and Applications    2023, 59 (9): 198-206.   DOI: 10.3778/j.issn.1002-8331.2212-0336
    Abstract639)      PDF(pc) (808KB)(418)       Save
    UAV aerial images have many characteristics, such as large-scale changes and complex backgrounds, so it is difficult for the existing detectors to detect small objects in aerial images. Aiming at the problem of mistake detection and omission, a small object detection algorithm model Drone-YOLO is proposed. A new detection branch is added to improve the detection capability at multiple scales, meanwhile the model contains a novel feature pyramid network with multi-level information aggregation, which realizes the fusion of cross-layers information. Then a feature fusion module based on multi-scale channel attention mechanism is designed to improve the focus on small objects. The classification task of the prediction head is decoupled from the regression task, and the loss function is optimized using Alpha-IoU to improve the accuracy of detection. The experimental results of VisDrone dataset show that the Drone-YOLO has improved the AP50 by 4.91?percentage points compared with the YOLOv5, and the inference time is only 16.78?ms. Compared with other mainstream models, it has a better detection effect for small targets, and can effectively complete the task of small target detection in UAV aerial images.
    Reference | Related Articles | Metrics
    Study on Hierarchical Multi-Label Text Classification Method of MSML-BERT Model
    HUANG Wei, LIU Guiquan
    Computer Engineering and Applications    2022, 58 (15): 191-201.   DOI: 10.3778/j.issn.1002-8331.2111-0176
    Abstract635)      PDF(pc) (1333KB)(249)       Save
    Hierarchical multi-label text classification is more challenging than ordinary multi-label text classification, since multiple labels of the text establish a tree-like hierarchy. Current methods use the same model structure to predict labels at different layers, ignoring their differences and diversity. They don’t model the hierarchical dependencies fully, resulting in poor prediction performance of labels at all layers, especially the lower-layer long-tail labels, and may lead to label inconsistency problems. In order to address the above problems, the multi-task learning architecture is introduced, and the MSML-BERT model is proposed. The model regards the label classification network of each layer in the label hierarchy as a learning task, and enhances the performance of tasks at all layers through the sharing and transfer of knowledge between tasks. Based on this, a multi-scale feature extraction module is designed to capture multi-scale and multi-grained features to form various knowledge required at different layers. Further, a multi-layer information propagation module is designed to fully model hierarchical dependencies and transfer knowledge in different layers to support lower-layer tasks. In this module, a hierarchical gating mechanism is designed to filter the knowledge flow among tasks in different layers. Extensive experiments are conducted on the RCV1-V2, NYT and WOS datasets, and the results reveal that the entire performance of this model, especially on the lower-layer long-tail labels, surpasses that of other prevailing models and maintains a low label inconsistency ratio.
    Reference | Related Articles | Metrics
    Research on Transformer-Based Single-Channel Speech Enhancement
    FAN Junyi, YANG Jibin, ZHANG Xiongwei, ZHENG Changyan
    Computer Engineering and Applications    2022, 58 (12): 25-36.   DOI: 10.3778/j.issn.1002-8331.2201-0371
    Abstract634)      PDF(pc) (1155KB)(270)       Save
    Deep learning can effectively solve the complex mapping problem between noisy speech signals and clean speech signals to improve the quality of single-channel speech enhancement, but the enhancement effect based on network models is not satisfactory. Transformer has been widely used in the field of speech signal processing due to the fact that it integrates multi-headed attention mechanism and can focus on the long-term correlation existing in speech. Based on this, deep learning-based speech enhancement models are reviewed,  the Transformer model and its internal structure are summarized, Transformer-based speech enhancement models are classified in terms of different implementation structures, and several example models are analyzed in detail. Furthermore, the performance of Transformer-based single-channel speech enhancement is compared on the public datasets, and their advantages and disadvantages are analyzed. The shortcomings of the related research work are summarized and future developments are envisaged.
    Reference | Related Articles | Metrics
    Review of Research on Driver Fatigue Driving Detection Methods
    ZHANG Rui, ZHU Tianjun, ZOU Zhiliang, SONG Rui
    Computer Engineering and Applications    2022, 58 (21): 53-66.   DOI: 10.3778/j.issn.1002-8331.2204-0053
    Abstract630)      PDF(pc) (946KB)(301)       Save
    The proportion of traffic accidents caused by fatigue driving has increased year by year, which has attracted widespread attention from researchers. At present, the research of fatigue driving testing is limited by various factors such as scientific and technological level, environment, and road, which makes it difficult to further develop fatigue driving detection technology. This article introduces the latest progress in driver fatigue driving detection methods in the past decade. The two categories of active detection method and passive detection method are elaborated and reviewed. According to the different characteristics of the two major types of detection methods, it is carefully classified. The advantages and limitations of various fatigue driving detection methods are further analyzed, and the detection algorithms used in the active detection method based on facial features in the past three years are analyzed and summarized. Finally, the shortcomings of various fatigue driving detection methods are summarized, and the future research trends in the field of fatigue detection are proposed, which provides new ideas for researchers to further research.
    Reference | Related Articles | Metrics
    Systematic Review on Graph Deep Learning in Medical Image Segmentation
    WANG Guoli, SUN Yu, WEI Benzheng
    Computer Engineering and Applications    2022, 58 (12): 37-50.   DOI: 10.3778/j.issn.1002-8331.2112-0225
    Abstract627)      PDF(pc) (1194KB)(277)       Save
    High precision segmentation of organs or lesions in medical image is a vital challenge issue for intelligent analysis of medical image, it has important clinical application value for auxiliary diagnosis and treatment of diseases. Recently, in solving challenging problems such as medical image information representation and accurate modeling of non-Euclidean spatial physiological tissue structures, the graph deep learning based medical image segmentation technology has made important breakthroughs, and it has shown significant information feature extraction and characterization advantages. The merged technology also can obtain more accurate segmentation results, which has become an emerging research hotspot in this field. In order to better promote the research and development of the deep learning segmentation algorithm for medical image graphs, this paper makes a systematic summary of the technological progress and application status in this field. The paper introduces the definition of graphs and the basic structure of graph convolutional networks, and elaborates on spectral graph convolution and spatial graph convolution operations. Then, according to the three technical structure modes of GCN combined with residual module, attention mechanism module and learning module, the research progress in medical image segmentation has been encapsulated. The application and development of graph deep learning algorithms based medical image segmentation are summarized and prospected to provide references and guiding principles for the technical development of related researches.
    Reference | Related Articles | Metrics
    Survey of Research on Deep Multimodal Representation Learning
    PAN Mengzhu, LI Qianmu, QIU Tian
    Computer Engineering and Applications    2023, 59 (2): 48-64.   DOI: 10.3778/j.issn.1002-8331.2206-0145
    Abstract604)      PDF(pc) (6521KB)(420)       Save
    Although deep learning has been widely used in many fields because of its powerful nonlinear representation capabilities, the structural and semantic gap between multi-source heterogeneous modal data seriously hinders the application of subsequent deep learning models. Many scholars have proposed a large number of representation learning methods to explore the correlation and complementarity between different modalities, and improve the performance of deep learning prediction and generalization. However, the research on multimodal representation learning is still in its infancy, and there are still many scientific problems to be solved. So far, multimodal representation learning still lacks a unified cognition, and the architecture and evaluation metrics of multimodal representation learning research are not fully clear. According to the feature structure, semantic information and representation ability of different modalities, this paper studies and analyzes the progress of deep multimodal representation learning from the perspectives of representation fusion and representation alignment. And the existing research work is systematically summarized and scientifically classified. At the same time, this paper analyzes the basic structure, application scenarios and key issues of representative frameworks and models, analyzes the theoretical basis and latest development of deep multimodal representation learning, and points out the current challenges and future development of multimodal representation learning research, to further promote the development and application of deep multimodal representation learning.
    Reference | Related Articles | Metrics
    Overview of Cross-Modal Retrieval Technology
    XU Wenwan, ZHOU Xiaoping, WANG Jia
    Computer Engineering and Applications    2022, 58 (23): 12-23.   DOI: 10.3778/j.issn.1002-8331.2205-0160
    Abstract592)      PDF(pc) (769KB)(216)       Save
    Cross modal retrieval can retrieve the information of other models through one model, which has become a research hot-spot in the era of big data. Researchers based on real value representation and binary representation to reduce the semantic gap of different modal information and compare the similarity effectively, but there will still be the problem of low retrieval efficiency or information loss. At present, how to further improve retrieval efficiency and information utilization is a key challenge for cross modal retrieval research. Firstly, the development status of real value representation and binary representation in cross-modal retrieval is introduced. Secondly, it analyzes and compares five cross modal retrieval methods based on modeling technology and similarity comparison under two presentation technologies:subspace learning, topic statistical model learning, deep learning, traditional hash and deep hash. Then, the latest multi-modal datasets are summarized to provide valuable reference for relevant researchers and engineers. Finally, the challenges of cross modal retrieval are analyzed and the future research directions in this field are pointed out.
    Reference | Related Articles | Metrics
    TLS Malicious Encrypted Traffic Identification Research
    KANG Peng, YANG Wenzhong, MA Hongqiao
    Computer Engineering and Applications    2022, 58 (12): 1-11.   DOI: 10.3778/j.issn.1002-8331.2110-0029
    Abstract589)      PDF(pc) (747KB)(309)       Save
    With the advent of the 5G era and the increasing public awareness of the Internet, the public has paid more and more attention to the protection of personal privacy. Due to malicious communication in the process of data encryption, to ensure data security and safeguard social and national interests, the research work on encrypted traffic identification is particularly important. Therefore, this paper describes the TLS traffic in detail and analyzes the improved technology of early identification method, including common traffic detection technology, DPI detection technology, proxy technology, and certificate detection technology. It also introduces machine learning models for selecting different TLS encrypted traffic characteristics, as well as many recent research results of deep learning models without feature selection. The deficiencies of the related research work are summarized, and the future research work and development trend of the technology have been prospected.
    Reference | Related Articles | Metrics
    Construction and Application of Discipline Knowledge Graph in Personalized Learning
    ZHAO Yubo, ZHANG Liping, YAN Sheng, HOU Min, GAO Mao
    Computer Engineering and Applications    2023, 59 (10): 1-21.   DOI: 10.3778/j.issn.1002-8331.2209-0345
    Abstract582)      PDF(pc) (929KB)(380)       Save
    The discipline knowledge graph is an important tool to support teaching activities based on big data, artificial intelligence and other technologies. As a kind of discipline knowledge semantic network, it contributes to the development of personalized learning systems and the promotion of new infrastructure for digital education resources. Firstly, this paper outlines the concept and classification of knowledge graph. Secondly, this paper summarizes the concept, characteristics, advantages, connotation and the support for personalized learning of discipline knowledge graph. Nextly, this paper focuses on the sorting of construction process of discipline knowledge graph:discipline ontology construction, discipline knowledge extraction, discipline knowledge fusion and discipline knowledge processing, and it also introduces the application of discipline knowledge graph in personalized learning situations and the challenges. Finally, this paper prospects the future tendency of discipline knowledge graph and personalized learning. It provides the reference and inspiration for the organization of educational resources and the innovative development of personalized learning.
    Reference | Related Articles | Metrics
    Improved YOLOv7-tiny’s Object Detection Lightweight Model
    LIU Haohan, FAN Yiming, HE Huaiqing, HUI Kanghua
    Computer Engineering and Applications    2023, 59 (14): 166-175.   DOI: 10.3778/j.issn.1002-8331.2302-0115
    Abstract573)      PDF(pc) (830KB)(215)       Save
    At present, the object detection algorithm has a large number of parameters and high computational complexity. However, the storage capacity and computing power of mobile terminals are limited and it is difficult to deploy it. So in this paper, it proposes the improved YOLOv7-tiny for mobile terminal devices. An efficient backbone network and a lightweight feature fusion network are further proposed with the ShuffleNet v1-improved and EALN-GS as the basic building units. The combination of the two part can reduce computational complexity, obtain more rich semantic information, and further improve detection accuracy. The Mish activation function is used to increase nonlinear expression and improve the generalization ability of the model. Experimental results show that compared with the original model, the accuracy of the improved model is improved by 3.3%, the number of parameters and calculations are reduced by 4.8% and 13.7%, and the model scale is reduced by 8.7%. The improved YOLOv7-tiny reduces the amount of parameters and calculations of the model while maintaining high accuracy, further improves the detection effect, and provides feasibility for deployment in edge terminal devices.
    Reference | Related Articles | Metrics
    Survey on Attention Mechanisms in Deep Learning Recommendation Models
    GAO Guangshang
    Computer Engineering and Applications    2022, 58 (9): 9-18.   DOI: 10.3778/j.issn.1002-8331.2112-0382
    Abstract551)      PDF(pc) (944KB)(286)       Save
    Aims to explore how the attention mechanism helps the recommendation model to dynamically focus on specific parts of the input that help to perform the current recommendation task. This paper analyzes the attention mechanism network framework and the weight calculation method of its input data, and then summarizes from the five perspectives of vanillaattention mechanism, co-attention mechanism, self-attention mechanism, hierarchical attention mechanism, and multi-head attention mechanism. Analyze how it uses key strategies, algorithms, or techniques to calculate the weight of the current input data, and use the calculated weights so that the recommendation model can focus on the necessary parts of the input at each step of the recommendation task, more effective user or item feature representation can be generated, and the operating efficiency and generalization ability of the recommendation model are improved. The attention mechanism can help the recommendation model assign different weights to each part of the input, extract more critical and important information, and enable the recommendation model to make more accurate judgments, and it will not bring more overhead to the calculation and storage of the recommendation model. Although the existing deep learning recommendation model with the attention mechanism can meet the needs of most recommendation tasks to a certain extent, it is certain that the uncertainty of human needs and the explosive growth of information under certain circumstances factors, it will still face the challenges of recommendation diversity, recommendation interpretability, and the integration of multiple auxiliary information.
    Reference | Related Articles | Metrics
    Review of Explainable Artificial Intelligence
    ZHAO Yanyu, ZHAO Xiaoyong, WANG Lei, WANG Ningning
    Computer Engineering and Applications    2023, 59 (14): 1-14.   DOI: 10.3778/j.issn.1002-8331.2208-0322
    Abstract517)      PDF(pc) (683KB)(352)       Save
    With the development of machine learning and deep learning, artificial intelligence technology has been gradually applied in various fields. However, one of the biggest drawbacks of adopting AI is its inability to explain the basis for predictions. The black-box nature of the models makes it impossible for humans to truly trust them yet in mission-critical application scenarios such as healthcare, finance, and autonomous driving, thus limiting the grounded application of AI in these areas. Driving the development of explainable artificial intelligence(XAI) has become an important issue for achieving mission-critical applications on the ground. At present, there is still a lack of research reviews on XAI in related fields at home and abroad, as well as a lack of studies focusing on causal explanation methods and the evaluation of explainable methods. Therefore, this study firstly starts from the characteristics of explanatory methods and divides the main explainable methods into three categories:model-independent methods, model-dependent methods, and causal explanation methods from the perspective of explanation types, and summarizes and analyzes them respectively, then summarizes the evaluation of explanation methods, lists the applications of explainable AI, and finally discusses the current problems of explainability and provides an outlook.
    Reference | Related Articles | Metrics
    Review of Research on Application of Vision Transformer in Medical Image Analysis
    SHI Lei, JI Qingyu, CHEN Qingwei, ZHAO Hengyi, ZHANG Junxing
    Computer Engineering and Applications    2023, 59 (8): 41-55.   DOI: 10.3778/j.issn.1002-8331.2206-0022
    Abstract508)      PDF(pc) (869KB)(316)       Save
    Deep self-attentive network(Transformer) has a natural ability to model global features and long-range correlations of input information, which is strongly complementary to the inductive bias property of convolutional neural networks(CNN). Inspired by its great success in natural language processing, Transformer has been widely introduced into various computer vision tasks, especially medical image analysis, and has achieved remarkable performance. In this paper, it first introduces the typical work of vision Transformer on natural images, and then organizes and summarizes the related work according to different lesions or organs in the subfields of medical image segmentation, medical image classification and medical image registration, focusing on the implementation ideas of some representative work. Finally, current researches are discussed and the future direction is pointed out. The purpose of this paper is to provide a reference for further in-depth research in this field.
    Reference | Related Articles | Metrics
    Lightweight Masked Face Recognition Algorithm Incorporating Attention Mechanism
    YE Zixun, ZHANG Hongying, HE Yujun
    Computer Engineering and Applications    2023, 59 (3): 166-174.   DOI: 10.3778/j.issn.1002-8331.2206-0222
    Abstract497)      PDF(pc) (707KB)(98)       Save
    The globalization pandemic of COVID-19 has made wearing masks to become a norm in people’s lives, and this preventive measure brings new challenges to face recognition algorithms. To address this problem, this paper proposes a lightweight masked face recognition algorithm. Firstly, this network introduces GhostNet as the backbone feature extraction network to improve the recognition speed. Secondly a FocusNet feature enhancement extraction network incorporating spatial attention mechanism is proposed to make the model focus on the upper half of the face region that are not covered by masks. Then, in order to overcome the problem of inadequate masked face dataset, a data augmentation method using 3D face mesh is proposed to add face mask. Finally, the experimental results show that, compared with the benchmark model, the proposed model reduces the number of model parameters by 84%, while the AP of masked face recognition increases by 4.29?percentage points, which better balances speed and accuracy.
    Reference | Related Articles | Metrics
    Implementation of Densified Mapping of Monocular ORB-SLAM for Embedded Platform
    MA Jingxuan, WANG Hongyu, CAO Yan, QIAO Wenchao, HAN Jiaozhi, WU Changxue
    Computer Engineering and Applications    2022, 58 (16): 213-218.   DOI: 10.3778/j.issn.1002-8331.2104-0139
    Abstract497)      PDF(pc) (946KB)(97)       Save
    The existing methods can not satisfy the requirements of high-precision, fast-processing localization and mapping in the indoor robot localization. ORB-SLAM3(oriented fast and rotated BRIEF-simultaneous localization and mapping 3) system, which has three parallel threads about tracking, mapping and loop closing, is the basis of the 3D dense mapping algorithm. The key frames meeting the requirements are resampled and the pose is updated in the tracking phase, local BA(bundle adjustment) and full BA. 3D point clouds are generated by the key frames and corresponding poses, thereby the dense map is obtained. The experiment results show that the positioning speed and the root-mean-square error reach 10.8 frame/s and 0.213% respectively, when the established system is operating on the Jetson AGX Xavier embedded platform with TUM data sets. The high precision and rapidity of the established system are verified, and can meet the requirements of indoor robot localization and mapping.
    Reference | Related Articles | Metrics
    Overview of GNSS Spoofing Jamming Detection
    ZHOU Yan, WANG Shanliang, YANG Wei, YI Jiong, ZHANG Shicang, CAI Chenglin
    Computer Engineering and Applications    2022, 58 (11): 12-22.   DOI: 10.3778/j.issn.1002-8331.2201-0055
    Abstract496)      PDF(pc) (760KB)(164)       Save
    In recent years, with the wide application of satellite navigation systems in military monitoring, precision agriculture, traffic monitoring, resource exploration, disaster assessment and other fields, improving the safety and robustness of satellite navigation systems has become a research hotspot. This paper first introduces the principle and classification of satellite navigation spoofing jamming. According to the process from satellite signal generation to the final realization of positioning and navigation and based on statistics, the current spoofing interference detection technology is divided into six categories:based on navigation data information, based on spatial processing, based on RF front-end processing, based on baseband digital signal processing, based on positioning and navigation operation results and based on machine learning, and the performance of the detection methods used in each category is compared. Finally, the future spoofing interference detection is prospected from two aspects of real-time detection and comprehensive detection.
    Reference | Related Articles | Metrics
    Overview of Human Behavior Recognition Based on Deep Learning
    DENG Miaolei, GAO Zhendong, LI Lei, CHEN Si
    Computer Engineering and Applications    2022, 58 (13): 14-26.   DOI: 10.3778/j.issn.1002-8331.2201-0096
    Abstract485)      PDF(pc) (676KB)(234)       Save
    Human behavior recognition aims to retrieve and identify the target behavior in surveillance video, which is a research hotspot in the field of artificial intelligence. Human behavior recognition algorithm based on traditional methods has some shortcomings, such as large dependence on sample data and easy to be affected by environmental noise. In order to solve this problem, many human behavior recognition algorithms based on deep learning are proposed for different application scenarios. Firstly, the traditional feature extraction methods and feature extraction methods based on deep learning in human behavior recognition task are introduced. Secondly, the human behavior recognition algorithms based on deep learning are summarized from two aspects of performance and application. The idea of human behavior recognition method based on 3D convolutional neural network, hybrid network, two-stream convolutional neural network and few-shot learning(FSL) and its performance on UCF101 and HMDB51 datasets are analyzed. Thirdly, on the basis of deep learning, the advantages, disadvantages and effectiveness of mainstream model migration methods are summarized. Finally, the shortcomings of existing human behavior recognition algorithms based on deep learning are summarized, and the possibility of FSL algorithm represented by meta-learning and transformer that will become the mainstream algorithm of future models is discussed. At the same time, the future development direction of human behavior recognition based on deep learning is prospected.
    Reference | Related Articles | Metrics
    DTZH1505:Large Scale Open Source Mandarin Speech Corpus
    WANG Dong, WANG Liyuan, WANG Daliang, QI Hongwei
    Computer Engineering and Applications    2022, 58 (11): 295-301.   DOI: 10.3778/j.issn.1002-8331.2112-0333
    Abstract480)      PDF(pc) (994KB)(89)       Save
    In recent years, deep learning has made a breakthrough in the field of speech recognition, and pushes forward the wide application of speech recognition technology in people’s daily lives. Further optimization of the speech recognition model needs to be supported by a larger scale calibrated data. However, the scale of the current open source audio data set is still too small, and corpus is mostly written language of news-based long texts. This paper, by talking about the popular speech recognition applications like human-computer interaction and intelligent customer service, builds and opens the largest ever Chinese Mandarin speech corpus DTZH1505 through crowdsourcing. Data set records natural speech of 6?408 speakers from 8 major Chinese dialect regions and 33 provinces, up to 1?505 hours and on various scenes like social networking, human-computer interaction, intelligent customer service and on-board commands. It can be widely used in the researches of corpus linguistics, conversation analysis, speech recognition, as well as speaker recognition. This paper implements a series benchmark speech recognition experiments, and the results show that:compared to the same scale Chinese speech corpus aishell2, the speech recognition model based on this data set has better performance.
    Reference | Related Articles | Metrics
    Review of Cross-Domain Object Detection Algorithms Based on Depth Domain Adaptation
    LIU Hualing, PI Changpeng, ZHAO Chenyu, QIAO Liang
    Computer Engineering and Applications    2023, 59 (8): 1-12.   DOI: 10.3778/j.issn.1002-8331.2210-0063
    Abstract465)      PDF(pc) (583KB)(322)       Save
    In recent years, the object detection algorithm based on deep learning has attracted wide attention due to its high detection performance. It has been successfully applied in many fields such as automatic driving and human-computer interaction and has achieved certain achievements. However, traditional deep learning methods are based on the assumption that the training set (source domain) and the test set (target domain) follow the same distribution, but this assumption is not realistic, which severely reduces the generalization performance of the model. How to align the distribution of the source domain and the target domain so as to improve the generalization of the object detection model has become a research hotspot in the past two years. This article reviews cross-domain object detection algorithms. First, it introduces the preliminary knowledge of cross-domain object detection:depth domain adaptation and object detection. The cross-domain object detection is decomposed into two small areas for an overview, in order to understand its development from the bottom logic. In turn, this article introduces the latest developments in cross-domain object detection algorithms, from the perspectives of differences, confrontation, reconstruction, hybrid and other five categories, and sorts out the research context of each category. Finally, this article summarizes and looks forward to the development trend of cross-domain object detection algorithms.
    Reference | Related Articles | Metrics
    Survey on Emotion Recognition in Conversation
    CHEN Xiaoting, LI Shi
    Computer Engineering and Applications    2023, 59 (3): 33-48.   DOI: 10.3778/j.issn.1002-8331.2207-0417
    Abstract464)      PDF(pc) (681KB)(234)       Save
    Emotion recognition in conversation(ERC) is a hot research topic in the field of emotion computing, which aims to detect the emotion category of each discourse during the dialogue. It has important research significance for dialogue understanding and dialogue generation. At the same time, it has a wide range of practical application value in many fields, such as social media analysis, recommendation system, medical treatment and human-computer interaction. With the continuous innovation and development of deep learning technology, emotion recognition in conversation has attracted more and more attention from academia and industry. At this stage, it is necessary to summarize these research results in an overview article in order to better carry out follow-up work. The research results in this field are comprehensively sorted out from the perspectives of problem definition, problem approach, research methods, and mainstream datasets, and the development of dialogue emotion recognition tasks is reviewed and analyzed. Compared with video and audio, dialogue text contains more information. Therefore, this paper focuses on combing the text dialogue emotion recognition methods, especially the methods based on deep learning. Finally, based on the current research status, this paper summarizes the open problems existing in the field of dialogue emotion recognition and the development trend in the future.
    Reference | Related Articles | Metrics
    Summary of Fault Diagnosis Methods for Rolling Bearings Under Variable Working Conditions
    HU Chunsheng, LI Guoli, ZHAO Yong, CHENG Fangjuan
    Computer Engineering and Applications    2022, 58 (18): 26-42.   DOI: 10.3778/j.issn.1002-8331.2202-0008
    Abstract463)      PDF(pc) (778KB)(166)       Save
    The working conditions of rotating machinery are more compound and the operating conditions are more severe in the context of intelligent manufacturing, leading to more substantial monitoring and fault diagnosis of the operating conditions of the equipment. Under the variable working conditions, the bearing vibration signal has the characteristics of amplitude variation, pulsating impact interval, non-constant sampling phase and signal noise pollution, etc, which limits the application of traditional rolling bearing fault diagnosis methods. For bearing fault diagnosis technology under variable working conditions, signal demodulation and analysis methods with artificially extracted features such as order tracking, time-frequency analysis, random vibration and chaos theory, deep learning methods represented by convolutional neural networks, self-encoder and deep confidence networks, and transfer learning methods have been developed. This paper reviews the progress in the field of variable condition bearing fault diagnosis in the past five years, introduces several current mainstream variable condition fault diagnosis methods in detail from the perspectives of algorithm principle, algorithm optimization and practical application of algorithms, discusses the advantages and shortcomings of various algorithms and their application scenarios, and points out the direction for the subsequent research.
    Reference | Related Articles | Metrics