Content of Pattern Recognition and Artificial Intelligence in our journal

        Published in last 1 year |  In last 2 years |  In last 3 years |  All
    Please wait a minute...
    For Selected: Toggle Thumbnails
    Robotic Actions and Strategy Demonstration Learning Method for Constructing Primitive Library Ideas
    LI Tiejun, LIU Jiaqi, LIU Jinyue, JIA Xiaohui
    Computer Engineering and Applications    2024, 60 (8): 90-98.   DOI: 10.3778/j.issn.1002-8331.2211-0261
    Abstract14)      PDF(pc) (1099KB)(22)       Save
    In order to solve the problems of demonstration data optimization, action and task strategy storage and call in the process of robot demonstration learning, a demonstration learning method based on primitive library is proposed. Action learning uses experts to drag the manipulator to perform actions to obtain demonstration data. Gaussian mixture model and Gaussian mixture regression are used to improve the data quality, and the final demonstration data is converted into the weight value of the basis function by the dynamic motion primitive algorithm. Strategy learning creates task steps as action primitives, adds the obtained weight value to the primitives, builds the primitive business card containing task execution strategy, and forms the primitive library to complete storage. When executing tasks, the primitives are sequentially called from the primitive library. YOLOv5 target detection network and AlexNet image classification network are used to detect target information to match actions and generalize new actions with original action characteristics. This method realizes learning actions and strategy storage from the demonstration, and combining appropriate actions to complete tasks according to actual goals. According to the experiment of steel bar binding scene, 5 action primitives are created, 10 basic actions are learned through expert teaching, the robot successfully completes the lashing task at the intersection of horizontal and vertical reinforcement by using the action primitive library.
    Reference | Related Articles | Metrics
    E-TUP:Joint Knowledge Graph Learning Recommendation Method Incorporating E-CP and TUP
    ZHAO Bo, WANG Yujia, NI Ji
    Computer Engineering and Applications    2024, 60 (8): 99-109.   DOI: 10.3778/j.issn.1002-8331.2211-0464
    Abstract19)      PDF(pc) (635KB)(34)       Save
    At present, most of the methods to introduce knowledge graphs into recommendation systems only introduce known surface knowledge graph entities, without predicting and mining the intrinsic relationships of the graphs, and thus cannot exploit the hidden relationships in the knowledge graphs. In this paper, the joint learning recommendation model E-TUP (enhance towards understanding of user preference) is proposed to address the above problem, and E-CP (enhance canonical polyadic) is used to complement the knowledge graph and deliver the complete information. A storage space negative sampling method is used to store and update high-quality negative triples with the training process to improve the quality of negative triples in the knowledge graph complementation. Experimental results on link prediction show that the storage-space approach improves the link prediction accuracy of the E-TUP model by up to 10.3% compared to existing models. Recommendation experiments on the MovieLens-1m and DBbook2014 datasets achieve the best results on several evaluation metrics, achieving up to 5.5% improvement, indicating that E-TUP can effectively exploit the hidden relationships in the knowledge graph to improve recommendation accuracy. Finally, the results of the recommendation experiments based on automotive maintenance data show that E-TUP can effectively recommend relevant knowledge.
    Reference | Related Articles | Metrics
    Improved Deeplabv3+ Crop Classification Method Based on Double Attention Fusion
    GUO Jin, SONG Tingqiang, SUN Yuanyuan, GONG Chuanjiang, LIU Yalin, MA Xinglu, FAN Haisheng
    Computer Engineering and Applications    2024, 60 (8): 110-120.   DOI: 10.3778/j.issn.1002-8331.2211-0468
    Abstract16)      PDF(pc) (850KB)(23)       Save
    In recent years, convolutional neural networks (CNN) have made new progress in crop classification research, but they have shown some limitations in modeling long-term dependence, and there are deficiencies in capturing the global characteristics of crops. In view of the above problems, Transformer is introduced into the Deeplab v3+ model, and a parallel branch structure for crop classification of drone images, the DeepTrans (Deeplab v3+ with Transformer) model is proposed. DeepTrans combines Transformer and CNN in a parallel way, which is conducive to the effective capture of global and local features. Transformer is introduced to enhance the remote dependence of information in the image and improve the extraction ability of crop global information. Channel attention mechanism and spatial attention mechanism are added to enhance the sensitivity of Transformer to channel information and the ability of ASPP (aerospace spatial pyramid pooling) to capture crop spatial information. The experimental result shows that the MIoU index of the DeepTrans model can reach 0.812, which is 3.9% higher than that of the Deeplab v3+ model. The accuracy of the model in the classification of five crops has been improved. For sugarcane, corn and banana which are easy to be wrongly classified, their IoU has been increased by 2.9%, 4.7% and 13% respectively. It can be seen that DeepTrans model has a better segmentation effect in the internal filling and global prediction of crop classification images, which is helpful to monitor the planting structure and scale of farmland crops more timely and accurately.
    Reference | Related Articles | Metrics
    Approximate Markov Blanket Feature Selection Method Based on Lasso Fusion
    LIU Ming, DU Jianqiang, LI Zhiqin, LUO Jigen, NIE Bin, ZHANG Mengting
    Computer Engineering and Applications    2024, 60 (8): 121-130.   DOI: 10.3778/j.issn.1002-8331.2212-0094
    Abstract14)      PDF(pc) (597KB)(7)       Save
    In feature selection, approximate Markov blankets are often used to judge redundant features, but the redundant features obtained are not identical. Therefore, when using approximate Markov blankets directly to delete redundant features, there may be situations that may lead to information loss and affect model accuracy. Therefore, an approximate Markov blanket feature selection method based on Lasso fusion for high-dimensional small sample data of traditional Chinese medicine metabonomics is proposed. The method is divided into two stages. In the first stage, irrelevant features are filtered by analyzing the correlation of features with the maximum information coefficient. In the second stage, approximate Markov blankets are used to construct similar feature groups, Lasso is used to evaluate the influence of features in similar feature groups, and redundant features are removed iteratively. The experimental results show that the algorithm can reduce the loss of useful information, remove irrelevant features and redundant features, and improve the accuracy and stability of the model.
    Reference | Related Articles | Metrics
    Cross-Modal Re-Identification Light Weight Network Combined with Data Enhancement
    CAO Ganggang, WANG Banghai, SONG Yu
    Computer Engineering and Applications    2024, 60 (8): 131-139.   DOI: 10.3778/j.issn.1002-8331.2212-0100
    Abstract17)      PDF(pc) (714KB)(24)       Save
    Among the existing cross modal re-identification methods, the research on lightweight network is less. Considering the requirement of hardware deployment for lightweight network, a new cross modal re-identification lightweight network is proposed. Based on Osnet ,the feature extractor and feature embedder are split. At the same time, data enhancement operations are used to maximize the use of limited data sets to improve network robustness, and the hard triplet loss is improved to further reduce the computation and reduce the difference between modals, so as to improve the accuracy of network identification. The network is lightweight, simple in structure and remarkable in effect. In the all search mode of SYSU-MM01 dataset, the rank-1/mAP of the proposed method reaches 65.56%,61.36% respectively, and the number of parameters is only 1.92×106.
    Reference | Related Articles | Metrics
    Model Robustness Enhancement Algorithm with Scale Invariant Condition Number Constraint
    XU Yangyu, GAO Baoyuan, GUO Jielong, SHAO Dongheng, WEI Xian
    Computer Engineering and Applications    2024, 60 (8): 140-147.   DOI: 10.3778/j.issn.1002-8331.2212-0114
    Abstract11)      PDF(pc) (605KB)(6)       Save
    Deep neural networks are vulnerable to adversarial examples, which has been threatening their application in safety-critical scenarios. Based on the explanation that adversarial examples arise from the highly linear behavior of neural networks, a model robustness enhancement algorithm based on scale-invariant condition number constraint is proposed. Firstly, all weight matrices are used to calculate their norms during the adversarial training process, and the scale-invariant constraint term is obtained through the logarithmic function. Secondly, the scale-invariant condition number constraint item is incorporated into the outer framework of adversarial training optimization, and the condition number value of all weight matrices are iteratively reduced through backpropagation, thereby performing linear transformation of the neural network in a well-conditioned high-dimensional weight space, to improve robustness against adversarial perturbations. This algorithm is suitable for visual models of both convolution and Transformer architectures. It can not only significantly improve the robust accuracy against white-box attacks such as PGD and AutoAttack, but also effectively enhance the adversarial robustness of defending against black-box attack algorithms including square attack. Incorporating the proposed constraint during adversarial training on Transformer-based image classification model, the condition number value of weight matrices drops by 20.7% on average, the robust accuracy can be increased by 1.16?percentage points when defending against PGD attacks. Compared with similar methods such as Lipschitz constraints, the proposed method can also improve the accuracy of clean examples and alleviate the problem of low generalization caused by adversarial training.
    Reference | Related Articles | Metrics
    Algorithm Research Based on Multi-Feature Fusion of EEG Signals with Convolutional Neural Networks
    SONG Shilin, ZHANG Xuejun
    Computer Engineering and Applications    2024, 60 (8): 148-155.   DOI: 10.3778/j.issn.1002-8331.2212-0301
    Abstract17)      PDF(pc) (707KB)(16)       Save
    In order to address the issue of low classification accuracy in motor imagery of electroencephalogram (EEG) signals, a feature extraction algorithm based on sample entropy and common spatial pattern (CSP) feature fusion has been proposed. The algorithm initially performs wavelet packet decomposition on the raw EEG signal, selecting the components containing μ and β rhythms for reconstruction. Subsequently, the sample entropy and CSP features of the reconstructed signal are separately extracted. These two features are then fused to create a new feature vector which is recognized using a one-dimensional convolutional neural network designs in the paper, to obtain the classification result. The proposes method achieves a classification accuracy of 91.66% on the BCI Dataset III in 2003 and an average classification accuracy of 85.29% on the BCI Dataset A in 2008. Comparing with multi-feature fusion algorithms proposed in recent literature, the accuracy is improved by 7.96 percentage points.
    Reference | Related Articles | Metrics
    Joint Entity Relation Extraction Model Based on Interactive Attention
    HAO Xiaofang, ZHANG Chaoqun, LI Xiaoxiang, WANG Darui
    Computer Engineering and Applications    2024, 60 (8): 156-164.   DOI: 10.3778/j.issn.1002-8331.2301-0154
    Abstract13)      PDF(pc) (609KB)(13)       Save
    Entity relationship triples extraction effect has a direct impact on the construction of knowledge graphs in the later stage. The traditional pipeline and joint extraction models do not effectively model the semantic features at sentence level and relationship level, which leads to the lack of model performance. To this end, a joint entity and relation extraction model RSIAN that fuses the semantic features at the sentence level and relation level is proposed, which learns the higher-order semantic associations at the sentence level and relation level through an interactive attention network to enhance the interaction between sentences and relations and assist the model in extraction decisions. The precision, recall, and F1 values of the Chinese tourism dataset (TDDS) constructs in this paper are 0.872, 0.760, and 0.812, respectively, all of which outperform the current mainstream model. To further validate the performance of the model on joint extraction in English, experiments are conducted on the publicly available English datasets NYT and Webnlg. The F1 values of the model compared to the baseline RSAN model are increased by 0.014 and 0.013, respectively, and this model also achieves better performance than the baseline model in the analysis experiments of overlapping triads.
    Reference | Related Articles | Metrics
    Bi-Bi-Modality with Bi-Gated Fusion in Multimodal Sentiment Analysis
    LIU Qingwen, Mairidan·Wushouer, Gulanbaier·Tuerhong
    Computer Engineering and Applications    2024, 60 (8): 165-172.   DOI: 10.3778/j.issn.1002-8331.2302-0088
    Abstract18)      PDF(pc) (567KB)(15)       Save
    In order to balance the uneven distribution of emotional information in different modalities and obtain a deeper multimodal emotional representation, this paper proposes a method called that bi-bi-modality with bi-gated fusion in multimodal sentiment analysis (BBBGF). In the process of fusing text-vision modality, text-audio modalities, the dominant position of the text modality among the three modalities is fully considered. At the same time, the dual fusion is used to obtain the multimodal emotional interaction information at the deeper level. In the first fusion, a fusion gate is used to decide how much knowledge of the supplement modality is added to the main modality, and getting two bi-modality hybrid knowledge matrices. In the second fusion, considering the redundant and repeated information in the two bi-modality mixed knowledge matrices, a selection gate is used to select effective and non-repeating emotional information as the final knowledge. On the public dataset CMU-MOSEI, the accuracy and F1 value of the sentiment binary classification reaches 86.2% and 86.1%, respectively, showing good robustness and advancement.
    Reference | Related Articles | Metrics
    Multiview Interaction Learning Network for Multimodal Aspect-Level Sentiment Analysis
    WANG Xuyang, PANG Wenqian, ZHAO Lijie
    Computer Engineering and Applications    2024, 60 (7): 92-100.   DOI: 10.3778/j.issn.1002-8331.2210-0288
    Abstract72)      PDF(pc) (591KB)(133)       Save
    Previous multimodal aspect-level sentiment analysis methods only use the general text and picture representations of the pre-trained model, which are insensitive to recognition of aspect and opinion word correlation, and the contribution of picture information to word representation cannot be obtained dynamically, so they cannot fully recognize the correlation between multimodal and aspects. Aiming at the above problems, a multiview interaction learning network is proposed. In order to make full use of the global features of the text in multimodal interaction, extracting sentence features from context and syntax views respectively sentences are extracted. Model the relationship among text, picture and aspect to realize multimodal interaction. At the same time, the interactive representation of different modalities is fused to dynamically obtain the contribution of visual information to each word in the text, and the correlation between modalities and aspects is fully extracted. Finally, the sentiment classification results are obtained through the fully connected layer and Softmax layer. Experiments on two datasets show that this model can effectively enhance the effect of multimodal aspect-level sentiment classification.
    Reference | Related Articles | Metrics
    Recommendation for Reducing Unrelated Neighborhoods by Combining Project Attribute Collaboration Signals
    ZHAO Wentao, XUE Saili, LIU Tiantian
    Computer Engineering and Applications    2024, 60 (7): 101-107.   DOI: 10.3778/j.issn.1002-8331.2211-0042
    Abstract26)      PDF(pc) (568KB)(34)       Save
    In the recommendation system, knowledge graph (KG) is used as auxiliary information to improve the performance and interpretability of the algorithm. However, when aggregating multi-hop neighbors, it usually aggregates and propagates all the entity information. Not all information in KG helps to improve recommendation results, and when aggregate neighborhood information is not differentiated, the embedding of entities will be interfered with by unrelated entities. Aiming at the above problems, this paper proposes a model of project attribute cooperative signals and screening highly relevant neighborhood policies (RUNCS) to improve the effect of recommendation. Specifically, firstly, the users who have clicked on the same item are called similar neighbors, and then the cooperative set of item attributes is obtained by combining the items clicked by similar neighbors with the item attributes in KG. Secondly, the similarity of item attributes is calculated to obtain the correlation score, which is used to screen the highly correlated neighbors. Finally, the attention mechanism is used to aggregate the information of its weight allocation. Experimental results on two benchmark datasets, music and film, show that compared with the existing optimal mainstream methods, the AUC of CTR forecast by this model increases by 0.6~2.7 percentage points.
    Reference | Related Articles | Metrics
    Research on Gesture Recognition Based on Improved YOLOv5 and Mediapipe
    NI Guangxing, XU Hua, WANG Chao
    Computer Engineering and Applications    2024, 60 (7): 108-118.   DOI: 10.3778/j.issn.1002-8331.2308-0097
    Abstract90)      PDF(pc) (686KB)(112)       Save
    The existing gesture recognition algorithms have the problems of large amounts of calculation and poor robustness. In this paper, a gesture recognition method based on IYOLOv5-Med (improved YOLOv5 Mediapipe) algorithm is proposed. This algorithm combines the improved YOLOv5 algorithm with the Mediapipe method, including gesture detection and gesture analysis. In the part of gesture detection, the traditional YOLOv5 algorithm is improved. Firstly, the C3 module is reconstructed by FastNet. Secondly, the CBS module is replaced by the GhostConv module in GhostNet. Thirdly, the SE attention mechanism module is introduced at the end of the Backbone network. The improved algorithm has a smaller model size and is more suitable for edge devices with limited resources. In the part of gesture analysis, a method based on Mediapipe is proposed. The key points of the hand are detected in the gesture area located in the gesture detection part, and the relevant features are extracted, and then identified by the naive Bayes classifier. The experimental findings affirm the efficacy of the IYOLOv5-Med algorithm introduced in this article. When compared to the conventional YOLOv5 algorithm, the parameters are reduced by 34.5%, the computations are reduced by 34.9%, and the model weight is decreased by 33.2%. The final average recognition rate reaches 0.997, and the implementation method is relatively simple, which has a good application prospect.
    Reference | Related Articles | Metrics
    Temporal Event Prediction Based on Implicit Relationship of Multiple Sequences
    HAO Zhifeng, LIU Jun, WEN Wen, CAI Ruichu
    Computer Engineering and Applications    2024, 60 (7): 119-127.   DOI: 10.3778/j.issn.1002-8331.2211-0137
    Abstract43)      PDF(pc) (533KB)(48)       Save
    Temporal event prediction refers to the prediction of the next event based on historical events. The event includes time and type attributes. Current work focuses on one-sided (event time or event type) prediction, but this cannot answer more detailed questions such as “when did something happen”. The challenges are as follows, the event type is very diverse and the behavior is often highly sparse, which makes prediction very difficult; secondly, the event time and event type belong to two domains. It is also a challenge to combine the information of these two domains. In response to the above challenges, one approach is explored from the perspective of fusing multiple sequences of hidden information. Firstly, based on the observation that certain event sequences have pattern similarity with each other, it proposes to model the hidden relationship graph of event sequences, and use the information of neighboring sequences to solve the problem of behavioral sparsity; secondly, by reasonably designing the neural network module, it maps the information of the time domain and type domain of events to a common abstract space, and solves the fusion modeling problem of event time and event type. By conducting a large number of experiments on several real datasets, the experimental results corroborate that the multiple sequence deep temporal model is better than a series of existing benchmark models.
    Reference | Related Articles | Metrics
    Demand Aware Attention Graph Neural Network for Session-Base Recommendation
    ZHENG Xiaoli, WANG Wei, DU Yuxuan, ZHANG Chuang
    Computer Engineering and Applications    2024, 60 (7): 128-140.   DOI: 10.3778/j.issn.1002-8331.2211-0248
    Abstract33)      PDF(pc) (1008KB)(45)       Save
    Aiming at the existing graph-based session recommendation method, which ignores the noise effect caused by the uncertainty of user behavior in the feedback data, and there is a problem that it cannot accurately and effectively capture user preferences, a demand aware attention graph neural network for session-based recommendation (DAAGNNSR) model is proposed. Firstly, the session data with time series is constructed as a graph, and the node embedding representation on the graph is learned by introducing the graph neural network. Secondly, the extracted project features are linearly aggregated into a user potential demand matrix using a demand aware aggregator to automatically attenuate noise interference, and at the same time, the low-rank multi-head attention network is used to interact with all item features item by item to generate a demand enhancing project representation. Again the joint independent position coding further analyzes the sequential association between the items, and the resulting independent position embedding is linearly fused with the project representation. Finally, a ranking recommendation list is generated by the prediction layer. The proposed model is trained and tested on three common datasets of Diginetica, Tmall and Nowplaying, and the experimental results show that the recommended accuracy of the model is better than other baseline models in all indexes, and compared with the graph context self-attention network for session-based recommendation (GCSAN), the NDCG@10 on Diginetica is improved by 5.6% and the Recall@10 on Tmall is increased by 6.4%. Compared with the SRGNN based on graph neural networks, the Precision@10 on Tmall is improved by 5.0%, and the recommended performance is significantly improved.
    Reference | Related Articles | Metrics
    Applying Attention Transformer Module to 3D Lip Sequence Identification
    PIAN Xinyang, WANG Yu, ZHANG Jie
    Computer Engineering and Applications    2024, 60 (7): 141-146.   DOI: 10.3778/j.issn.1002-8331.2211-0295
    Abstract39)      PDF(pc) (598KB)(58)       Save
    Lip behavior is a newly emerging biometric recognition technology, and 3D lip point cloud sequences have become an important biometric feature for individual identification because they contain real lip spatial structure and motion information. The disorder and unstructured characteristics of the 3D point cloud, however, make the extraction of spatio-temporal features very difficult. To this end, a deep learning network model based on point feature Transformer is proposed for 3D lip sequences identification. This network uses an improved four-layer PointNet++ as the backbone of the network to extract features in a layered manner. And, an attention Transformer module with dynamic lip features is designed and added behind each layer of the PointNet++ network in order to learn more spatio-temporal features containing identity information, which is beneficial to learn the relevant information among different feature maps and to capture effectively contextual information among different video sequence frames. Compared with the Transformers constructed by other attention mechanisms, the Transformer module proposed in this paper has fewer parameters, and experimental results on the S3DFM-FP and S3DFM-VP datasets show that the proposed network model is effective for the identification task of 3D lip point cloud sequences. Even on the S3DFM-VP dataset, which is not constrained by pose, the proposed network model shows better performance.
    Reference | Related Articles | Metrics
    Conformer-Based Speaker Recognition Model for Real-Time Multi-Scenarios
    XUAN Xi, HAN Runping, GAO Jingxin
    Computer Engineering and Applications    2024, 60 (7): 147-156.   DOI: 10.3778/j.issn.1002-8331.2210-0145
    Abstract34)      PDF(pc) (792KB)(39)       Save
    To handle the problems of poor performances of speaker verification systems, appearing in multiple scenarios with cross-domain utterances, long-duration utterances and noisy utterances, a real-time robust speaker recognition model, PMS-Conformer, is designed based on Conformer in this paper. The architecture of the PMS-Conformer is inspired by the state-of-the-art model named MFA-Conformer. PMS-Conformer has made the improvements on the acoustic feature extractor, network components and loss calculation module of MFA-Conformer respectively, having the novel and effective acoustic feature extractor and the robust speaker embedding extractor with high generalization?capability. PMS-Conformer is trained on VoxCeleb1&2 dataset, and it is compared with the baseline MFA-Conformer and ECAPA-TDNN, and extensive comparison experiments are conducted on the speaker verification tasks. The experimental results show that on VoxMovies with cross-domain utterances, SITW with long-duration utterances and VoxCeleb-O processed by adding noise to its utterances, the ASV system built with PMS-Conformer is more competitive than those built with MFA-Conformer and ECAPA-TDNN respectively. Moreover, the trainable Params and RTF of the speaker embedding extractor of PMS-Conformer are significantly lower than those of ECAPA-TDNN. All evaluation experiment results demonstrate that PMS-Conformer exhibits good performances in real-time multi-scenarios.
    Reference | Related Articles | Metrics
    Image-Guided Augmentation Visual Question Answering Model Combined with Contrastive Learning
    YANG You, YAO Lu
    Computer Engineering and Applications    2024, 60 (7): 157-166.   DOI: 10.3778/j.issn.1002-8331.2211-0447
    Abstract41)      PDF(pc) (911KB)(47)       Save
    Aiming at two problems of existing attention-based encoder-decoder visual question answering (VQA) models, image-guided augmentation VQA model combined with contrastive learning (IGA-CL) is proposed. One of these two problems is that single-type image feature contains incomplete visual information, another is that existing models rely overly on question guidance. To solve the first problem, the dual-feature visual decoder (DFVD) is proposed. It is based on the Transformer language encoder. After the single image feature is extended into two types:region and grid, visual information is refined through constructing complementary spatial relations based on the relative positions of different type features. To solve the second problem, the vision-guided language decoder (VGLD) is proposed. It twice matches the two decoded image features with the question features. In which, the parallel gated guided-attention (PGGA) is designed to correct adaptively the guiding proportions of different image features to the question. To obtain more similar mutual information, the contrastive learning loss function is introduced during the training process. It can compare the similarity of different modal features in the hidden space during model reasoning. The proposed model achieves 73.82%, 72.49% and 57.44% overall accuracy on the VQA 2.0, COCO-QA and GQA, respectively, which is 2.92 percentage points, 4.41 percentage points and 0.8 percentage points better than MCAN model. Extensive ablation experiments and visualization analysis demonstrate the effectiveness of the proposed model. Experimental results show that the proposed model can obtain more relevant language-vision information and has stronger generalization ability for different types of question samples.
    Reference | Related Articles | Metrics
    Medical Report Extraction Generation Model Integrated with BioCopy Mechanism
    LIU Lan, TAN Hongye
    Computer Engineering and Applications    2024, 60 (6): 155-162.   DOI: 10.3778/j.issn.1002-8331.2210-0071
    Abstract35)      PDF(pc) (600KB)(40)       Save
    Wise information technology of med (WITMED) is a new health care service mode that integrates information technologies such as artificial intelligence. Among them, automatic generation of medical reports is an important task in the field of WITMED. This task generates semi-structured medical reports based on patient self-report and doctor-patient dialogue. The medical report not only contains the chief complaint and other sub parts, but also contains a large number of medical terms from the original text. In view of these characteristics, a summary model integrating extraction and abstraction of BioCopy mechanism is adopted. Firstly, the model extracts key sentences for each sub-part to eliminate the interference of irrelevant information. Then, the BioCopy mechanism is added when generating the medical report to copy the medical terms in the key sentences to ensure the accuracy of the results. The experimental results on CCL 2021 datasets show that this model is superior to the main baseline and has achieved good results.
    Reference | Related Articles | Metrics
    Deep Neural Network Channel Pruning Compression Method for Filter Elasticity
    LI Ruiquan, ZHU Lu, LIU Yuanyuan
    Computer Engineering and Applications    2024, 60 (6): 163-171.   DOI: 10.3778/j.issn.1002-8331.2210-0420
    Abstract22)      PDF(pc) (713KB)(25)       Save
    Deep neural network (DNN) has achieved great success in various fields. Due to its high computing and storage costs, it is difficult to directly deploy them to resource constrained mobile devices. To solve this problem, the importance evaluation of the global filter in the network is studied, and a channel pruning compression method with filter elasticity is proposed to reduce the size of the neural network. Firstly, the method sets the local dynamic threshold between layers to improve the shortage of over pruning in L1 regularization (L1 lasso) sparse training. Then, its output is multiplied by the channel scaling factor to replace the ordinary convolution layer module. The importance of the global filter is defined by the elastic size of the filter. Its values are estimated and ranked by Taylor formula. At the same time, a new iterative pruning framework of the filter is designed to balance the contradiction between the pruning performance and the pruning speed. Finally, the improved L1 regularization training and the importance of the global filter are used to prune the composite channels. VGG-16 is tested on CIFAR-10 using the proposed method, which reduces 80.2% of floating-point operations (FLOPs) and 97.0% of parameter quantities, without significant loss of accuracy, indicating the effectiveness of the method, which can compress neural networks in a large scale, and can be deployed to resource constrained terminal devices.
    Reference | Related Articles | Metrics
    Research on Angle-Optimised Grasp Detection Algorithm Based on YOLOv5
    CHEN Chunchao, SUN Donghong
    Computer Engineering and Applications    2024, 60 (6): 172-179.   DOI: 10.3778/j.issn.1002-8331.2210-0499
    Abstract45)      PDF(pc) (649KB)(50)       Save
    Aiming at the problems that the current robot grasping detection method is too discrete in predicting the grasping angle and the grasping process may produce large off-angle, which reduces the grasping detection accuracy and even leads to grasping failure, an improved robot real-time grasping detection method based on the YOLOv5 neural network model is proposed. Firstly, the grasping frame coordinates and grasping angles are extracted based on the single-stage object detection model YOLOv5. Afterwards, the grasping angles are divided more carefully, while circular smoothing label is introduced to accommodate the periodicity of the angles, links between adjacent angles are established, the YOLOv5 detection head is decoupled, and the loss function is optimized to improve the detection accuracy. Finally, an experimental validation is performed on the Cornell dataset. The experimental results show that the proposed algorithm can better predict the grasping angle and improve the grasping detection accuracy compared with the classical grasping detection methods. The model achieves 97.5% accuracy and 71?FPS detection speed on the Cornell dataset.
    Reference | Related Articles | Metrics
    Dynamic Dominant Fusion Multimodal Sentiment Analysis Method Based on Autoencoder
    YANG Xi, GUO Junjun, YAN Haining, TAN Kaiwen, XIANG Yan, YU Zhengtao
    Computer Engineering and Applications    2024, 60 (6): 180-187.   DOI: 10.3778/j.issn.1002-8331.2211-0010
    Abstract40)      PDF(pc) (562KB)(51)       Save
    In multimodal sentiment analysis, the modality that plays a dominant role in sentiment determination is dynamic. Usually, traditional multimodal sentiment analysis methods regard text modal as a dominant modal, but ignore the change in dominant modal at different moments due to the differences between modalities. Aiming at selecting dominant modal dynamically in each moment, this paper proposes a dynamic dominant fusion multimodal sentiment analysis method based on autoencoder. The method firstly encodes single modalities and obtains multimodal fusion features. And an autoencoder is applied to map them into a shared space. In the space, the dominant modality is selected by correlation between unimodal and fusion modal. Finally, the dominant multimodal information is used to guide multimodal fusion to obtain the multimodal robustness representation. The extensive experiments on the multimodal sentiment analysis benchmark dataset CMU-MOSI demonstrate the effectiveness of the proposed method, which outperform most of the existing state-of-the-art multimodal sentiment analysis methods.
    Reference | Related Articles | Metrics
    Medical Named Entity Recognition Based on Multi-Feature and Co-Attention
    LIU Xinning
    Computer Engineering and Applications    2024, 60 (6): 188-198.   DOI: 10.3778/j.issn.1002-8331.2211-0094
    Abstract35)      PDF(pc) (707KB)(43)       Save
    Aiming at the situation that the accuracy of entity recognition cannot be effectively improved due to the lack of fusion of unique feature information of medical texts in current Chinese medical named entity recognition, and the problem that single attention mechanism affects the effectiveness of entity classification, a Chinese medical named entity recognition method based on multi-feature fusion and co-attention mechanism is proposed. Firstly, the vector representation of the original medical text is obtained by using the pre-trained model, and then the feature vectors of word granularity are obtained by using the bidirectional gated recurrent neural network (BiGRU). Secondly, combined with the distinctive radical features of medical named entities, iterative dilation convolution neural network (IDCNN) is used to extract radical-level feature vectors. Finally, the co-attention network is used to integrate medical vector features to generate double correlation features of <Characters-Radicals> pair, and then conditional random field (CRF) is used to output entity recognition results. The experimental results show that, compared with other entity recognition models, it can achieve higher accuracy, recall and F1 value on the CCKS dataset. At the same time, although the complexity of the recognition model is increased, the performance does not decrease significantly.
    Reference | Related Articles | Metrics
    Chinese Named Entity Recognition Methods Combined with Entity Boundary Cues
    HUANG Rong, CHEN Yanping, HU Ying, HUANG Ruizhang, QIN Yongbin
    Computer Engineering and Applications    2024, 60 (6): 199-206.   DOI: 10.3778/j.issn.1002-8331.2211-0119
    Abstract42)      PDF(pc) (612KB)(42)       Save
    As a basic task in information extraction, named entity recognition (NER)  can provide effective support for machine translation, relation extraction and other downstream tasks, and is of great research significance. To tackle the problem of fuzzy entity boundary in Chinese named entity recognition methods, a named entity recognition model combining entity boundary cue is proposed. The model is composed of three modules:boundary detection, cue generation and entity classification. Firstly, the entity boundary detection module is used to identify the entity boundary. Then, the entity span is generated according to the boundary information in the cue generation module, and the text sequence with the boundary cue label is obtained. Through the boundary cue label, the model can perceive the entity boundary in the sentence, and learn the semantic dependence characteristics of the entity boundary and context. Finally, the text sequence with boundary cue tags is employed as the input of entity classification module, and the semantic interaction between tags is enhanced by the Biaffine mechanism, then combined with the joint prediction of multilayer perceptron and Biaffine mechanism as the result of entity recognition. The F1 values of this model on ACE2005 Chinese dataset and Weibo dataset reaches 90.47% and 73.54% respectively, which verifies the effectiveness of the model for Chinese named entity recognition.
    Reference | Related Articles | Metrics
    Visual Question Answering Research on Joint Knowledge and Visual Information Reasoning
    SU Zhenqiang, GOU Gang
    Computer Engineering and Applications    2024, 60 (5): 95-102.   DOI: 10.3778/j.issn.1002-8331.2209-0456
    Abstract68)      PDF(pc) (740KB)(76)       Save
    As a task in the multimodal field, visual question answering requires fusion and reasoning of the features of different modalities, which has important application value. In traditional visual question answering, the answer to the question can be well reasoned only by relying on the visual information of the image. However, pure visual information cannot meet the diverse question-answering needs in real-world scenarios. Knowledge plays an important role in visual question answering and can well assist question answering. Knowledge-based open visual question answering needs to correlate external knowledge to achieve cross-modal scene understanding. In order to better integrate visual information and related external knowledge, a bilinear structure for joint knowledge and visual information reasoning is proposed, and a dual-guided attention module for knowledge representation by image features and question features is designed. Firstly, the model uses the pre-trained vision-language model to obtain the feature representation and visual reasoning information of the question and image, Secondly, the similarity matrix is used to calculate the image object area under the semantic alignment of the question, and then the regional features after the joint alignment of the question features jointly guide the knowledge representation to obtain knowledge reasoning information. Finally, the visual reasoning information and the knowledge reasoning information are fused to get the final answer. The experimental results on the OK-VQA dataset show that the accuracy of the model is 1.97 percentage points and 4.82 percentage points higher than the two baseline methods, respectively, which verifies the effectiveness of the model.
    Reference | Related Articles | Metrics
    Cross-Domain Face in Vivo Detection of Unilateral Adversarial Network Algorithm
    ZENG Fanzhi, WU Chutao, ZHOU Yan
    Computer Engineering and Applications    2024, 60 (5): 103-111.   DOI: 10.3778/j.issn.1002-8331.2210-0134
    Abstract56)      PDF(pc) (589KB)(65)       Save
    In the existing cross-domain face detection algorithms, the feature extraction process is prone to overfitting and lack of feature aggregation, resulting in insufficient generalization. To solve this problem, this paper proposes a unilateral adversarial network algorithm for cross-domain face in vivo detection. Firstly, grouping convolution and improved reciprocal residual structure are fused to replace ordinary convolution to reduce network parameters and enhance the expression ability of face fine-grained features, and an adaptive feature normalization module is introduced, emphasizing the face in vivo information region fade irrelevant background region in the image. Effectively it avoids the overfitting merging of live face information and enhances the ability of face detection from different source domains. Secondly, based on NetVLAD, the channel attention mechanism module is introduced. As a branch of feature aggregation network, the channel attention mechanism module learns the semantic information of local features in different source domains, effectively enhancing the generalization ability of face live information classification in different source domains. Finally, a two-module fusion network is designed to improve the accuracy of cross-domain face detection in unknown scenes. Experimental results on OULU-NPU, CASIA-FASD, MSU-MFSD, and Idiap Replay-Attack data sets show that, the proposes algorithm has good performance in cross-data set tests of O&C&M to I, O&C&I to M, I&C&M to O, and O&M&I to C. Among them, the performance evaluation indexes of O&C&I to M and O&M&I to C have improved the accuracy by 0.99 percentage points and 0.5 percentage points respectively.
    Reference | Related Articles | Metrics
    Multi-View Representation Model for Aspect-Level Sentiment Analysis
    XU Xuefeng, HAN Hu
    Computer Engineering and Applications    2024, 60 (5): 112-121.   DOI: 10.3778/j.issn.1002-8331.2210-0231
    Abstract60)      PDF(pc) (637KB)(63)       Save
    The fine-grained sentiment analysis of user comments for specific aspects is a popular research topic in the field of natural language processing. For the flexibility of comment statements in content expression and syntactic structure, the integrated use of lexical, syntactic and semantic knowledge to enhance the feature representation of comment statements is a major research idea at present. Based on this, a graph convolutional network model for multi-view fusion representation is proposed in this paper. First, the model learns to obtain context-based enhanced representations of comment statements through self-attention and aspect-specific attention. Second, two different representations of comment utterances based on syntax and semantics are obtained through graph convolution operations using syntactic dependency information and word co-occurrence information, respectively. Finally, a hierarchical fusion approach is designed based on obtaining three different view representations to achieve information sharing and complementarity among different view representations by combining and convolving the three representations. Experimental results on five publicly available datasets show that the model achieves better performance than existing models.
    Reference | Related Articles | Metrics
    Reverse Inference Model for Document-Level Event Extraction
    JI Wanting, MA Yuhang, LU Wenyi, WANG Junlu, SONG Baoyan
    Computer Engineering and Applications    2024, 60 (5): 122-129.   DOI: 10.3778/j.issn.1002-8331.2210-0237
    Abstract45)      PDF(pc) (634KB)(67)       Save
    Event extraction aims to detect event types and extract event arguments from unstructured texts. Existing methods still have limitations when dealing with document-level texts. This is because a document-level text may consist of multiple events, and the event arguments that constitute an event are usually scattered across different sentences. To address the above challenges, this paper proposes a reverse inference model for document-level event extraction (RIDEE). Based on the design without trigger words, RIDEE simplifies the document-level event extraction into two sub-tasks, candidate event argument extraction and event triggering inference, to extract event arguments in parallel and detect event types. In addition, this paper designs an event dependency pool for storing historical events, so that the model can make full use of the dependencies between events when processing the multi-event texts. Experimental results on the public dataset show that RIDEE has better performance in document-level event extraction than the existing event extraction models.
    Reference | Related Articles | Metrics
    Bidirectional Interaction Model for Joint Multiple Intent Detection and Slot Filling
    LI Shi, SUN Zhenpeng
    Computer Engineering and Applications    2024, 60 (5): 130-138.   DOI: 10.3778/j.issn.1002-8331.2210-0271
    Abstract53)      PDF(pc) (578KB)(43)       Save
    Intent detection and slot filling are the two major tasks of spoken language understanding, which are highly correlated and are usually trained jointly. As the spoken language understanding task progresses, it has been found that users’ utterances in real-life scenarios often contain multiple intents. However, some joint models can only detect a single intent in user utterances and fail to adequately model the correlation between multiple intents and slots. Since the information of multiple intents in the utterance can guide the slot filling and the slot information can also help the better detection of intents. The Label Bi-Interaction model uses the graph attention network to establish a two-way interaction between intents and slots. Specifically, Label Bi-Interaction model associates two tasks bidirectionally so that the model can explore the relationship between multiple intents and slots, and introduces the label information of the two tasks to enable the model to learn the relationship between utterance context and labels. This improves the accuracy of intent detection and slot filling and optimizes the overall performance of spoken language understanding. Experiments show that the performance of the model on the MixATIS and MixSNIPS two multi-intent datasets has been significantly improved compared to other models.
    Reference | Related Articles | Metrics
    Personalized Dynamically Ensemble for Alzheimer’s Disease Auxiliary Diagnostics Model
    LIANG Haolin, PAN Dan, ZENG An, YANG Baoyao, Xiaowei Song
    Computer Engineering and Applications    2024, 60 (5): 139-145.   DOI: 10.3778/j.issn.1002-8331.2211-0150
    Abstract34)      PDF(pc) (728KB)(39)       Save
    Aiming at the problem that most of the Alzheimer’s disease (AD) classification models do not develop specific strategies for input samples, resulting in the easy neglect of personalized differential information between samples, a novel AD classification model, namely personalized dynamically ensemble convolution neural network (PDECNN), is proposed. Considering the difference in degeneration degree of brain regions between input samples, PDECNN involves an attention-net to evaluate the degeneration degree of each brain region specific to the input sample. Based on the estimated results of the attention-net, a dynamic ensemble strategy is newly designed to select and fuse brain region features for AD identification. In addition, by redesigning the loss function, the problem that the optimal gradient of unselected brain regions cannot be obtained is solved, thus improving the AD classification performance. The experimental results show that compared with AD classification models, the classification accuracy of PDECNN in the AD vs. HC (healthy cognition), MCIc (mild cognitive impairment who will convert to AD) vs. HC, and MCIc vs. MCInc (mild cognitive impairment who will not convert to AD) experiments can be increased by 4%, 11%, and 8%, respectively. The experimental results also find that the degenerate brain regions identified by the PDECNN correlate with AD’s clinical manifestations.
    Reference | Related Articles | Metrics
    Prompt-Learning Inspired Approach to Unsupervised Sentiment Style Transfer
    CAI Guoyong, LI Anqing
    Computer Engineering and Applications    2024, 60 (5): 146-155.   DOI: 10.3778/j.issn.1002-8331.2211-0317
    Abstract32)      PDF(pc) (929KB)(44)       Save
    Text style transfer is the task of transferring text generation with certain desired style properties while preserving the original text content. In order to improve the transfer quality under non-parallel style corpus, this paper proposes a new method to guide the fill-mask model to rewrite the sentence into the target style. Overall, this approach is based on the delete-retrieve-generate style transfer framework, but employs a large unsupervised pre-trained language model and Transformer architecture. According to the working principle of Transformer, firstly, the method of filtering style attributes from the source sentence is improved, and then the internal knowledge of the pre-trained model is mined by the prompt learning method to generate the target style words. Experiments on two sentiment benchmark datasets show that the method outperforms existing editing methods, with an average improvement of more than 14% in relative scores on the comprehensive metrics.
    Reference | Related Articles | Metrics
    Data Reconstruction Based on Quantum Generative Adversarial Networks
    JIANG Yida, WANG Mingming
    Computer Engineering and Applications    2024, 60 (5): 156-164.   DOI: 10.3778/j.issn.1002-8331.2211-0363
    Abstract30)      PDF(pc) (1188KB)(31)       Save
    Data reconstruction using neural networks is a very important research topic in the field of artificial intelligence. Generative adversarial network (GAN), as a popular algorithm of artificial intelligence in recent years, has a good performance in completing data reconstruction tasks. As a new computing mode that can accelerate classical computing, quantum computing is constantly merging with classical artificial intelligence algorithms. Among them, pure quantum generative adversarial network (QGAN) has a good performance in image related tasks. However, since the fitting ability in the quantum model still needs to be improved, this paper proposes a hybrid generative confrontation network (Q-CGAN) based on the GAN framework to realize the data reconstruction task. The framework exploits classical nonlinearities to improve fitting performance and quantum properties to provide quantum speedups. Using the MNIST handwritten data set to compare and verify the reconstruction effect of the hybrid model in this network, the results show that Q-CGAN has better performance in the data reconstruction process than pure quantum generators. In addition, the effect of using different quantum encoding schemes and different parameterized quantum circuits in the hybrid model on the data reconstruction effect is also studied.
    Reference | Related Articles | Metrics
    Oversampling Method for Imbalanced Data Using Credible Counterfactual
    GAO Feng, SONG Mei, ZHU Yi
    Computer Engineering and Applications    2024, 60 (5): 165-171.   DOI: 10.3778/j.issn.1002-8331.2211-0413
    Abstract47)      PDF(pc) (494KB)(34)       Save
    A new method for imbalanced data sets on counterfactual is proposed (counterfactual,CF), and further removes the “incredibility” composite samples, which aims to solve the problem of the traditional sampling method that cannot make full use of the data set information. Its core idea is to synthesize new samples based on the original instance features of the dataset. Compared with the traditional oversampling interpolation method, it can fully mine the boundary decision information in the data, so as to provide more useful information for the classifier and improve the classification performance. A lot of comparative experiments have been carried out on 9 KEEL and UCI unbalanced datasets, 5 different classifiers (SVM, DT, Logistic, RF, AdaBoost) and 4 traditional oversampling methods (SMOTE, B1-SMOTE, B2-SMOTE, ADASYN). The results show that the algorithm has higher AUC value、F1 value and G-mean value, which can effectively solve the class imbalance problem.
    Reference | Related Articles | Metrics
    Method for Generating Summary of Judgment Documents Based on Trial Logic Steps
    YU Shuai, SONG Yumei, QIN Yongbin, HUANG Ruizhang, CHEN Yanping
    Computer Engineering and Applications    2024, 60 (4): 113-121.   DOI: 10.3778/j.issn.1002-8331.2209-0142
    Abstract75)      PDF(pc) (2408KB)(63)       Save
    Judicial summary oriented to judgment documents is the key technology to improve the analytical ability of judgment documents. As the carrier of the trial activities, the judgment documents accurately present the trial logic of the case. However, the current abstract methods only focus on the serialization information of the judgment documents, ignore the logical structure, and can not effectively solve the problems of too long texts and redundant information. A judgment document summary generation method based on the trial logic steps is proposed. The method of “extraction + generation” is adopted. The extraction part uses the multi-label classification method to extract four sentence sets of “type, claim, fact and result” according to the logic steps of the people's court. The generation part gets the summary from the fine-tuned T5-PEGASUS model. And the input text of the “fact” part is denoised by using the maximum similarity matching algorithm based on internal knowledge, which further improves the summary effect. The experimental results show that, compared with the mainstream pointer-generated network summary model, the proposed method improves the F1 index of ROUGE-1, ROUGE-2 and ROUGE-L by 17.99 percentage points, 21.24 percentage points and 21.86 percentage points, respectively. This shows that introducing logical structure into the judicial summarization can improve the performance of the task.
    Reference | Related Articles | Metrics
    Speech Emotion Recognition for Imbalanced Datasets
    ZHANG Huiyun, HUANG Heming
    Computer Engineering and Applications    2024, 60 (4): 122-132.   DOI: 10.3778/j.issn.1002-8331.2209-0099
    Abstract69)      PDF(pc) (3523KB)(49)       Save
    The sample balance is crucial for machine learning. The importance of certain classes may be higher than its number on the imbalanced datasets. This paper studies the imbalanced datasets for speech emotion recognition. Firstly, the imbalanced baseline datasets EMODB and IEMOCAP are augmented with different signal-to-noise?ratios, and the datasets EMODBM and IEMOCAPM are constructed. Secondly, six techniques namely SMOTE, RandomOverSampler, SMOTEENN, ADASYN, TomekLinks and SMOTETomek are adopted to resample the baseline datasets, and the augmented datasets are constructed to achieve the category balance. Thirdly, 21-dimensional low-level descriptor features are extracted from the baseline datasets and the augmented datasets. Finally, a novel model MA-CapsNet is proposed to validate the effectiveness of the resampling techniques. The results show that all types of emotion samples are basically balanced after resampling, which makes the learning of the model MA-CapsNet fairer. In addition, the model MA-CapsNet has better robustness on the resampling datasets.
    Reference | Related Articles | Metrics
    Cross-Modality Person Re-identification Combined with Data Augmentation and Feature Fusion
    SONG Yu, WANG Banghai, CAO Ganggang
    Computer Engineering and Applications    2024, 60 (4): 133-141.   DOI: 10.3778/j.issn.1002-8331.2209-0120
    Abstract78)      PDF(pc) (2285KB)(88)       Save
    The difficulty of visible-infrared person re-identification problem lies in the large modal difference between images. Most existing methods alleviate the modal difference by generating fake images through generative adversarial networks or extracting modal shared features on the original image. However, training a generative adversarial network consumes a lot of computational resources and generates fake images that are prone to introduce noise, and extracting modal shared features can also result in the loss of important differentiated features. To address these problems, a new cross-modality person re-identification network is proposed. Firstly, automatic data augmentation is used to improve model robustness. Then, instance regularization is used in the network to reduce modal differences. Finally, the pedestrian features of different scales extracted by each layer of the network are organically fused, and the fused features contain more differentiated features related to pedestrian identity. The proposed method achieves Rank-1/mAP of 69.47%/65.05% in the all-search mode of the SYSU-MM01, and Rank-1/mAP of 85.73%/77.77% in the visible to infrared modes of the RegDB, respectively. The experimental results have a significant improvement effect.
    Reference | Related Articles | Metrics
    Efficient Cross-Domain Transformer Few-Shot Semantic Segmentation Network
    FANG Hong, LI Desheng, JIANG Guangjie
    Computer Engineering and Applications    2024, 60 (4): 142-152.   DOI: 10.3778/j.issn.1002-8331.2209-0156
    Abstract78)      PDF(pc) (2740KB)(73)       Save
    Few-shot semantic segmentation aims at only using several labeling samples to learn target features and complete the semantic segmentation task. The main problems in mainstream research are low training efficiency, meta training and meta testing in the same data domain. For this task, this paper proposes an efficient, cross-domain few-shot semantic segmentation network based on Transformer: SGFNet. In the encoding layer, use the shared weight MixVisionTransformer to build a siamese network to extract the support set and query set image features. In the relationship calculation layer, calculate the Hadamard product of the support set image feature vector and its corresponding mask to extract the target feature maps, and calculate the relationship between them and the image features of the query set. In the decoder layer, improve the MLP decoder and propose a residual decoder to decode the features of different hierarchies to obtain the final segmentation result. Experiments show that the model only needs to use a single 3090 GPU on the FSS-1000 dataset for training 1.5~4.0 h to get the optimal result 1-shot mIoU 87.0% on PASCAL-5i and the COCO-20i dataset perform cross-domain tests to achieve non-cross-domain effects, the 1-shot mIoU is 60.4% and 33.0%, respectively, proving that the model is efficient and cross-domain.
    Reference | Related Articles | Metrics
    Joint Dual-Dimensional User Scheduling for Adaptive Federated Edge Learning
    ZHANG Jiuchuan, PAN Chunyu, ZHOU Tianyi, LI Xuehua, DING Yong
    Computer Engineering and Applications    2024, 60 (4): 153-162.   DOI: 10.3778/j.issn.1002-8331.2209-0459
    Abstract33)      PDF(pc) (2402KB)(31)       Save
    Federated edge learning does not need to transmit local data, which greatly reduces the pressure on the uplink while protecting user privacy. The federated edge learning uses the local dataset to train the local model through the intelligent edge device and then uploads the model parameters to the central server; the central server aggregates the local model parameters uploaded locally to form a global model and updates it, and then sends the updated model to the intelligent edge device to start a new iteration. However, the local model accuracy and local model training time will have a significant impact on the global model aggregation and model update process. Therefore, an adaptive dynamic batch gradient descent strategy is firstly proposed, which can automatically adjust the batch size extracted by gradient descent during the local model training process, and optimize the local model accuracy and convergence speed of federated learning. Next, aiming at the non-IID characteristics of user data, an adaptive dynamic batch gradient descent algorithm that combines two-dimensional user scheduling strategies is designed, and two-dimensional constraints are imposed by convergence time and data diversity. After training and testing on the MNIST dataset, fashion MNIST dataset and CIFAR-10 dataset, the algorithm effectively reduces the aggregation waiting time and further improves the global model accuracy and convergence speed. Compared with the gradient descent algorithm with fixed batches of 64, 128, and 256, the global model accuracy of this algorithm is increased by 32.4%, 45.2%, and 87.5% when running for 100 seconds.
    Reference | Related Articles | Metrics
    Extreme Multi-Label Text Classification Based on Balance Function
    CHEN Zhaohong, HONG Zhiyong, YU Wenhua, ZHANG Xin
    Computer Engineering and Applications    2024, 60 (4): 163-172.   DOI: 10.3778/j.issn.1002-8331.2209-0472
    Abstract71)      PDF(pc) (2723KB)(46)       Save
    Extreme multi-label text classification is a challenging task in the field of natural language processing. In this task, there is a long-tailed distribution situation of labeled data. In this situation, model has a poor ability to learn tail labels classification, which results the overall classification effect is not good. In order to address the above problems, an extreme multi-label text classification method based on balance function is proposed. Firstly, the BERT pre-training model is used for word embedding. Further, the concatenated output of the multi-layer encoder in the pre-trained model is used as the text vector representation to obtain richer text semantic information and improves the model convergence speed. Finally, the balance function is used to assign different attenuation weights to the training losses of different prediction labels, which improves the learning ability of the method on tail label classification. The experimental results on Eurlex-4K and Wiki10-31K datasets show that the evaluation indicators P@1, P@3 and P@5 respectively reach 86.95%, 74.12%, 61.43% and 88.57%, 77.46% and 67.90%.
    Reference | Related Articles | Metrics
    Few-Shot Scene Classification with Attention Mechanism in Remote Sensing
    ZHANG Duona, ZHAO Hongjia, LU Yuanyao, CUI Jian, ZHANG Baochang
    Computer Engineering and Applications    2024, 60 (4): 173-182.   DOI: 10.3778/j.issn.1002-8331.2301-0012
    Abstract80)      PDF(pc) (2555KB)(68)       Save
    Remote sensing scene classification is a hot research topic in the field of computer vision, and it is of great significance to semantic understanding of remote sensing images. At present, remote sensing scene classification methods based on deep learning occupy a dominant position in this field. However, it suffers from the lack of samples and poor model generalization ability in actual application scenarios. Therefore, this paper proposes a few-shot remote scene classification method based on attention mechanism, and designs a structure of dual-branches similarity measurement. This method is based on the meta-learning training strategy to divide the dataset into tasks. At the meantime, the input images are divided into blocks in order to preserve the feature distribution in the remote sensing image. Then the lightweight attention module is introduced into the feature extraction network to reduce the risk of overfitting and ensure the acquisition of discriminative features. Finally, based on earth mover’s distance (EMD), a dual-branches similarity measurement module is added to improve the discriminative ability of the classifier. The results show that compared with the classic small-sample learning method, the few-shot remote scene classification method proposed in this paper can significantly improve the classification performance.
    Reference | Related Articles | Metrics
    Improving Detection and Positioning of Insulators in YOLO v7
    ZHANG Jianrui, WEI Xia, ZHANG Linxuan, CHEN Yannan, LU Jie
    Computer Engineering and Applications    2024, 60 (4): 183-191.   DOI: 10.3778/j.issn.1002-8331.2306-0094
    Abstract101)      PDF(pc) (2604KB)(92)       Save
    This paper aims to address the problems of low accuracy and high leakage rate due to the influence of different insulator sizes and background interference in the target detection task of power systems. Firstly, a convolutional block attention module (CBAM) is added to the YOLO v7 backbone network to make the network model pay more attention to the insulator features from both channel and space aspects and reduce the leakage rate in insulator detection. Secondly, a concentrated feature pyramid (CFP) is added to the deeper layer of the network model to allow the information exchange and aggregation of feature maps at different scales, thus obtaining more comprehensive insulator features and improving insulator detection accuracy. Finally, the k-means algorithm is used to cluster the preselected frames to obtain the most suitable insulator preselected frame size. The experimental results show that the improved YOLO v7 network model has a detection mAP (mean average precision) of 96.2%, a precision of 90.8%, and a recall of 93.8%. The improved method in this paper has a wide application prospect in the insulator detection of power systems.
    Reference | Related Articles | Metrics