Computer Engineering and Applications

Select

Robotic Actions and Strategy Demonstration Learning Method for Constructing Primitive Library Ideas

LI Tiejun, LIU Jiaqi, LIU Jinyue, JIA Xiaohui

Computer Engineering and Applications 2024, 60 (8): 90-98. DOI: 10.3778/j.issn.1002-8331.2211-0261

Abstract （14）

PDF（pc）（1099KB）（22）

Save

In order to solve the problems of demonstration data optimization, action and task strategy storage and call in the process of robot demonstration learning, a demonstration learning method based on primitive library is proposed. Action learning uses experts to drag the manipulator to perform actions to obtain demonstration data. Gaussian mixture model and Gaussian mixture regression are used to improve the data quality, and the final demonstration data is converted into the weight value of the basis function by the dynamic motion primitive algorithm. Strategy learning creates task steps as action primitives, adds the obtained weight value to the primitives, builds the primitive business card containing task execution strategy, and forms the primitive library to complete storage. When executing tasks, the primitives are sequentially called from the primitive library. YOLOv5 target detection network and AlexNet image classification network are used to detect target information to match actions and generalize new actions with original action characteristics. This method realizes learning actions and strategy storage from the demonstration, and combining appropriate actions to complete tasks according to actual goals. According to the experiment of steel bar binding scene, 5 action primitives are created, 10 basic actions are learned through expert teaching, the robot successfully completes the lashing task at the intersection of horizontal and vertical reinforcement by using the action primitive library.

Reference | Related Articles | Metrics

Select

E-TUP:Joint Knowledge Graph Learning Recommendation Method Incorporating E-CP and TUP

ZHAO Bo, WANG Yujia, NI Ji

Computer Engineering and Applications 2024, 60 (8): 99-109. DOI: 10.3778/j.issn.1002-8331.2211-0464

Abstract （19）

PDF（pc）（635KB）（34）

Save

At present, most of the methods to introduce knowledge graphs into recommendation systems only introduce known surface knowledge graph entities, without predicting and mining the intrinsic relationships of the graphs, and thus cannot exploit the hidden relationships in the knowledge graphs. In this paper, the joint learning recommendation model E-TUP (enhance towards understanding of user preference) is proposed to address the above problem, and E-CP (enhance canonical polyadic) is used to complement the knowledge graph and deliver the complete information. A storage space negative sampling method is used to store and update high-quality negative triples with the training process to improve the quality of negative triples in the knowledge graph complementation. Experimental results on link prediction show that the storage-space approach improves the link prediction accuracy of the E-TUP model by up to 10.3% compared to existing models. Recommendation experiments on the MovieLens-1m and DBbook2014 datasets achieve the best results on several evaluation metrics, achieving up to 5.5% improvement, indicating that E-TUP can effectively exploit the hidden relationships in the knowledge graph to improve recommendation accuracy. Finally, the results of the recommendation experiments based on automotive maintenance data show that E-TUP can effectively recommend relevant knowledge.

Reference | Related Articles | Metrics

Select

Improved Deeplabv3+ Crop Classification Method Based on Double Attention Fusion

GUO Jin, SONG Tingqiang, SUN Yuanyuan, GONG Chuanjiang, LIU Yalin, MA Xinglu, FAN Haisheng

Computer Engineering and Applications 2024, 60 (8): 110-120. DOI: 10.3778/j.issn.1002-8331.2211-0468

Abstract （16）

PDF（pc）（850KB）（23）

Save

In recent years, convolutional neural networks (CNN) have made new progress in crop classification research, but they have shown some limitations in modeling long-term dependence, and there are deficiencies in capturing the global characteristics of crops. In view of the above problems, Transformer is introduced into the Deeplab v3+ model, and a parallel branch structure for crop classification of drone images, the DeepTrans (Deeplab v3+ with Transformer) model is proposed. DeepTrans combines Transformer and CNN in a parallel way, which is conducive to the effective capture of global and local features. Transformer is introduced to enhance the remote dependence of information in the image and improve the extraction ability of crop global information. Channel attention mechanism and spatial attention mechanism are added to enhance the sensitivity of Transformer to channel information and the ability of ASPP (aerospace spatial pyramid pooling) to capture crop spatial information. The experimental result shows that the MIoU index of the DeepTrans model can reach 0.812, which is 3.9% higher than that of the Deeplab v3+ model. The accuracy of the model in the classification of five crops has been improved. For sugarcane, corn and banana which are easy to be wrongly classified, their IoU has been increased by 2.9%, 4.7% and 13% respectively. It can be seen that DeepTrans model has a better segmentation effect in the internal filling and global prediction of crop classification images, which is helpful to monitor the planting structure and scale of farmland crops more timely and accurately.

Reference | Related Articles | Metrics

Select

Approximate Markov Blanket Feature Selection Method Based on Lasso Fusion

LIU Ming, DU Jianqiang, LI Zhiqin, LUO Jigen, NIE Bin, ZHANG Mengting

Computer Engineering and Applications 2024, 60 (8): 121-130. DOI: 10.3778/j.issn.1002-8331.2212-0094

Abstract （14）

PDF（pc）（597KB）（7）

Save

In feature selection, approximate Markov blankets are often used to judge redundant features, but the redundant features obtained are not identical. Therefore, when using approximate Markov blankets directly to delete redundant features, there may be situations that may lead to information loss and affect model accuracy. Therefore, an approximate Markov blanket feature selection method based on Lasso fusion for high-dimensional small sample data of traditional Chinese medicine metabonomics is proposed. The method is divided into two stages. In the first stage, irrelevant features are filtered by analyzing the correlation of features with the maximum information coefficient. In the second stage, approximate Markov blankets are used to construct similar feature groups, Lasso is used to evaluate the influence of features in similar feature groups, and redundant features are removed iteratively. The experimental results show that the algorithm can reduce the loss of useful information, remove irrelevant features and redundant features, and improve the accuracy and stability of the model.

Reference | Related Articles | Metrics

Select

Cross-Modal Re-Identification Light Weight Network Combined with Data Enhancement

CAO Ganggang, WANG Banghai, SONG Yu

Computer Engineering and Applications 2024, 60 (8): 131-139. DOI: 10.3778/j.issn.1002-8331.2212-0100

Abstract （17）

PDF（pc）（714KB）（24）

Save

Among the existing cross modal re-identification methods, the research on lightweight network is less. Considering the requirement of hardware deployment for lightweight network, a new cross modal re-identification lightweight network is proposed. Based on Osnet ,the feature extractor and feature embedder are split. At the same time, data enhancement operations are used to maximize the use of limited data sets to improve network robustness, and the hard triplet loss is improved to further reduce the computation and reduce the difference between modals, so as to improve the accuracy of network identification. The network is lightweight, simple in structure and remarkable in effect. In the all search mode of SYSU-MM01 dataset, the rank-1/mAP of the proposed method reaches 65.56%,61.36% respectively, and the number of parameters is only 1.92×106.

Reference | Related Articles | Metrics

Select

Model Robustness Enhancement Algorithm with Scale Invariant Condition Number Constraint

XU Yangyu, GAO Baoyuan, GUO Jielong, SHAO Dongheng, WEI Xian

Computer Engineering and Applications 2024, 60 (8): 140-147. DOI: 10.3778/j.issn.1002-8331.2212-0114

Abstract （11）

PDF（pc）（605KB）（6）

Save

Deep neural networks are vulnerable to adversarial examples, which has been threatening their application in safety-critical scenarios. Based on the explanation that adversarial examples arise from the highly linear behavior of neural networks, a model robustness enhancement algorithm based on scale-invariant condition number constraint is proposed. Firstly, all weight matrices are used to calculate their norms during the adversarial training process, and the scale-invariant constraint term is obtained through the logarithmic function. Secondly, the scale-invariant condition number constraint item is incorporated into the outer framework of adversarial training optimization, and the condition number value of all weight matrices are iteratively reduced through backpropagation, thereby performing linear transformation of the neural network in a well-conditioned high-dimensional weight space, to improve robustness against adversarial perturbations. This algorithm is suitable for visual models of both convolution and Transformer architectures. It can not only significantly improve the robust accuracy against white-box attacks such as PGD and AutoAttack, but also effectively enhance the adversarial robustness of defending against black-box attack algorithms including square attack. Incorporating the proposed constraint during adversarial training on Transformer-based image classification model, the condition number value of weight matrices drops by 20.7% on average, the robust accuracy can be increased by 1.16?percentage points when defending against PGD attacks. Compared with similar methods such as Lipschitz constraints, the proposed method can also improve the accuracy of clean examples and alleviate the problem of low generalization caused by adversarial training.

Reference | Related Articles | Metrics

Select

Algorithm Research Based on Multi-Feature Fusion of EEG Signals with Convolutional Neural Networks

SONG Shilin, ZHANG Xuejun

Computer Engineering and Applications 2024, 60 (8): 148-155. DOI: 10.3778/j.issn.1002-8331.2212-0301

Abstract （17）

PDF（pc）（707KB）（16）

Save

In order to address the issue of low classification accuracy in motor imagery of electroencephalogram (EEG) signals, a feature extraction algorithm based on sample entropy and common spatial pattern (CSP) feature fusion has been proposed. The algorithm initially performs wavelet packet decomposition on the raw EEG signal, selecting the components containing μ and β rhythms for reconstruction. Subsequently, the sample entropy and CSP features of the reconstructed signal are separately extracted. These two features are then fused to create a new feature vector which is recognized using a one-dimensional convolutional neural network designs in the paper, to obtain the classification result. The proposes method achieves a classification accuracy of 91.66% on the BCI Dataset III in 2003 and an average classification accuracy of 85.29% on the BCI Dataset A in 2008. Comparing with multi-feature fusion algorithms proposed in recent literature, the accuracy is improved by 7.96 percentage points.

Reference | Related Articles | Metrics

Select

Joint Entity Relation Extraction Model Based on Interactive Attention

HAO Xiaofang, ZHANG Chaoqun, LI Xiaoxiang, WANG Darui

Computer Engineering and Applications 2024, 60 (8): 156-164. DOI: 10.3778/j.issn.1002-8331.2301-0154

Abstract （13）

PDF（pc）（609KB）（13）

Save

Entity relationship triples extraction effect has a direct impact on the construction of knowledge graphs in the later stage. The traditional pipeline and joint extraction models do not effectively model the semantic features at sentence level and relationship level, which leads to the lack of model performance. To this end, a joint entity and relation extraction model RSIAN that fuses the semantic features at the sentence level and relation level is proposed, which learns the higher-order semantic associations at the sentence level and relation level through an interactive attention network to enhance the interaction between sentences and relations and assist the model in extraction decisions. The precision, recall, and F1 values of the Chinese tourism dataset (TDDS) constructs in this paper are 0.872, 0.760, and 0.812, respectively, all of which outperform the current mainstream model. To further validate the performance of the model on joint extraction in English, experiments are conducted on the publicly available English datasets NYT and Webnlg. The F1 values of the model compared to the baseline RSAN model are increased by 0.014 and 0.013, respectively, and this model also achieves better performance than the baseline model in the analysis experiments of overlapping triads.

Reference | Related Articles | Metrics

Select

Bi-Bi-Modality with Bi-Gated Fusion in Multimodal Sentiment Analysis

LIU Qingwen, Mairidan·Wushouer, Gulanbaier·Tuerhong

Computer Engineering and Applications 2024, 60 (8): 165-172. DOI: 10.3778/j.issn.1002-8331.2302-0088

Abstract （18）

PDF（pc）（567KB）（15）

Save

In order to balance the uneven distribution of emotional information in different modalities and obtain a deeper multimodal emotional representation, this paper proposes a method called that bi-bi-modality with bi-gated fusion in multimodal sentiment analysis (BBBGF). In the process of fusing text-vision modality, text-audio modalities, the dominant position of the text modality among the three modalities is fully considered. At the same time, the dual fusion is used to obtain the multimodal emotional interaction information at the deeper level. In the first fusion, a fusion gate is used to decide how much knowledge of the supplement modality is added to the main modality, and getting two bi-modality hybrid knowledge matrices. In the second fusion, considering the redundant and repeated information in the two bi-modality mixed knowledge matrices, a selection gate is used to select effective and non-repeating emotional information as the final knowledge. On the public dataset CMU-MOSEI, the accuracy and F1 value of the sentiment binary classification reaches 86.2% and 86.1%, respectively, showing good robustness and advancement.

Reference | Related Articles | Metrics

Select

Multiview Interaction Learning Network for Multimodal Aspect-Level Sentiment Analysis

WANG Xuyang, PANG Wenqian, ZHAO Lijie

Computer Engineering and Applications 2024, 60 (7): 92-100. DOI: 10.3778/j.issn.1002-8331.2210-0288

Abstract （72）

PDF（pc）（591KB）（133）

Save

Previous multimodal aspect-level sentiment analysis methods only use the general text and picture representations of the pre-trained model, which are insensitive to recognition of aspect and opinion word correlation, and the contribution of picture information to word representation cannot be obtained dynamically, so they cannot fully recognize the correlation between multimodal and aspects. Aiming at the above problems, a multiview interaction learning network is proposed. In order to make full use of the global features of the text in multimodal interaction, extracting sentence features from context and syntax views respectively sentences are extracted. Model the relationship among text, picture and aspect to realize multimodal interaction. At the same time, the interactive representation of different modalities is fused to dynamically obtain the contribution of visual information to each word in the text, and the correlation between modalities and aspects is fully extracted. Finally, the sentiment classification results are obtained through the fully connected layer and Softmax layer. Experiments on two datasets show that this model can effectively enhance the effect of multimodal aspect-level sentiment classification.

Reference | Related Articles | Metrics

Select

Recommendation for Reducing Unrelated Neighborhoods by Combining Project Attribute Collaboration Signals

ZHAO Wentao, XUE Saili, LIU Tiantian

Computer Engineering and Applications 2024, 60 (7): 101-107. DOI: 10.3778/j.issn.1002-8331.2211-0042

Abstract （26）

PDF（pc）（568KB）（34）

Save

In the recommendation system, knowledge graph (KG) is used as auxiliary information to improve the performance and interpretability of the algorithm. However, when aggregating multi-hop neighbors, it usually aggregates and propagates all the entity information. Not all information in KG helps to improve recommendation results, and when aggregate neighborhood information is not differentiated, the embedding of entities will be interfered with by unrelated entities. Aiming at the above problems, this paper proposes a model of project attribute cooperative signals and screening highly relevant neighborhood policies (RUNCS) to improve the effect of recommendation. Specifically, firstly, the users who have clicked on the same item are called similar neighbors, and then the cooperative set of item attributes is obtained by combining the items clicked by similar neighbors with the item attributes in KG. Secondly, the similarity of item attributes is calculated to obtain the correlation score, which is used to screen the highly correlated neighbors. Finally, the attention mechanism is used to aggregate the information of its weight allocation. Experimental results on two benchmark datasets, music and film, show that compared with the existing optimal mainstream methods, the AUC of CTR forecast by this model increases by 0.6~2.7 percentage points.

Reference | Related Articles | Metrics

Select

Research on Gesture Recognition Based on Improved YOLOv5 and Mediapipe

NI Guangxing, XU Hua, WANG Chao

Computer Engineering and Applications 2024, 60 (7): 108-118. DOI: 10.3778/j.issn.1002-8331.2308-0097

Abstract （90）

PDF（pc）（686KB）（112）

Save

The existing gesture recognition algorithms have the problems of large amounts of calculation and poor robustness. In this paper, a gesture recognition method based on IYOLOv5-Med (improved YOLOv5 Mediapipe) algorithm is proposed. This algorithm combines the improved YOLOv5 algorithm with the Mediapipe method, including gesture detection and gesture analysis. In the part of gesture detection, the traditional YOLOv5 algorithm is improved. Firstly, the C3 module is reconstructed by FastNet. Secondly, the CBS module is replaced by the GhostConv module in GhostNet. Thirdly, the SE attention mechanism module is introduced at the end of the Backbone network. The improved algorithm has a smaller model size and is more suitable for edge devices with limited resources. In the part of gesture analysis, a method based on Mediapipe is proposed. The key points of the hand are detected in the gesture area located in the gesture detection part, and the relevant features are extracted, and then identified by the naive Bayes classifier. The experimental findings affirm the efficacy of the IYOLOv5-Med algorithm introduced in this article. When compared to the conventional YOLOv5 algorithm, the parameters are reduced by 34.5%, the computations are reduced by 34.9%, and the model weight is decreased by 33.2%. The final average recognition rate reaches 0.997, and the implementation method is relatively simple, which has a good application prospect.

Reference | Related Articles | Metrics

Select

Temporal Event Prediction Based on Implicit Relationship of Multiple Sequences

HAO Zhifeng, LIU Jun, WEN Wen, CAI Ruichu

Computer Engineering and Applications 2024, 60 (7): 119-127. DOI: 10.3778/j.issn.1002-8331.2211-0137

Abstract （43）

PDF（pc）（533KB）（48）

Save

Temporal event prediction refers to the prediction of the next event based on historical events. The event includes time and type attributes. Current work focuses on one-sided (event time or event type) prediction, but this cannot answer more detailed questions such as “when did something happen”. The challenges are as follows, the event type is very diverse and the behavior is often highly sparse, which makes prediction very difficult; secondly, the event time and event type belong to two domains. It is also a challenge to combine the information of these two domains. In response to the above challenges, one approach is explored from the perspective of fusing multiple sequences of hidden information. Firstly, based on the observation that certain event sequences have pattern similarity with each other, it proposes to model the hidden relationship graph of event sequences, and use the information of neighboring sequences to solve the problem of behavioral sparsity; secondly, by reasonably designing the neural network module, it maps the information of the time domain and type domain of events to a common abstract space, and solves the fusion modeling problem of event time and event type. By conducting a large number of experiments on several real datasets, the experimental results corroborate that the multiple sequence deep temporal model is better than a series of existing benchmark models.

Reference | Related Articles | Metrics

Select

Demand Aware Attention Graph Neural Network for Session-Base Recommendation

ZHENG Xiaoli, WANG Wei, DU Yuxuan, ZHANG Chuang

Computer Engineering and Applications 2024, 60 (7): 128-140. DOI: 10.3778/j.issn.1002-8331.2211-0248

Abstract （33）

PDF（pc）（1008KB）（45）

Save

Aiming at the existing graph-based session recommendation method, which ignores the noise effect caused by the uncertainty of user behavior in the feedback data, and there is a problem that it cannot accurately and effectively capture user preferences, a demand aware attention graph neural network for session-based recommendation (DAAGNNSR) model is proposed. Firstly, the session data with time series is constructed as a graph, and the node embedding representation on the graph is learned by introducing the graph neural network. Secondly, the extracted project features are linearly aggregated into a user potential demand matrix using a demand aware aggregator to automatically attenuate noise interference, and at the same time, the low-rank multi-head attention network is used to interact with all item features item by item to generate a demand enhancing project representation. Again the joint independent position coding further analyzes the sequential association between the items, and the resulting independent position embedding is linearly fused with the project representation. Finally, a ranking recommendation list is generated by the prediction layer. The proposed model is trained and tested on three common datasets of Diginetica, Tmall and Nowplaying, and the experimental results show that the recommended accuracy of the model is better than other baseline models in all indexes, and compared with the graph context self-attention network for session-based recommendation (GCSAN), the NDCG@10 on Diginetica is improved by 5.6% and the Recall@10 on Tmall is increased by 6.4%. Compared with the SRGNN based on graph neural networks, the Precision@10 on Tmall is improved by 5.0%, and the recommended performance is significantly improved.

Reference | Related Articles | Metrics

Select

Applying Attention Transformer Module to 3D Lip Sequence Identification

PIAN Xinyang, WANG Yu, ZHANG Jie

Computer Engineering and Applications 2024, 60 (7): 141-146. DOI: 10.3778/j.issn.1002-8331.2211-0295

Abstract （39）

PDF（pc）（598KB）（58）

Save

Lip behavior is a newly emerging biometric recognition technology, and 3D lip point cloud sequences have become an important biometric feature for individual identification because they contain real lip spatial structure and motion information. The disorder and unstructured characteristics of the 3D point cloud, however, make the extraction of spatio-temporal features very difficult. To this end, a deep learning network model based on point feature Transformer is proposed for 3D lip sequences identification. This network uses an improved four-layer PointNet++ as the backbone of the network to extract features in a layered manner. And, an attention Transformer module with dynamic lip features is designed and added behind each layer of the PointNet++ network in order to learn more spatio-temporal features containing identity information, which is beneficial to learn the relevant information among different feature maps and to capture effectively contextual information among different video sequence frames. Compared with the Transformers constructed by other attention mechanisms, the Transformer module proposed in this paper has fewer parameters, and experimental results on the S3DFM-FP and S3DFM-VP datasets show that the proposed network model is effective for the identification task of 3D lip point cloud sequences. Even on the S3DFM-VP dataset, which is not constrained by pose, the proposed network model shows better performance.

Reference | Related Articles | Metrics

Select

Conformer-Based Speaker Recognition Model for Real-Time Multi-Scenarios

XUAN Xi, HAN Runping, GAO Jingxin

Computer Engineering and Applications 2024, 60 (7): 147-156. DOI: 10.3778/j.issn.1002-8331.2210-0145

Abstract （34）

PDF（pc）（792KB）（39）

Save

To handle the problems of poor performances of speaker verification systems, appearing in multiple scenarios with cross-domain utterances, long-duration utterances and noisy utterances, a real-time robust speaker recognition model, PMS-Conformer, is designed based on Conformer in this paper. The architecture of the PMS-Conformer is inspired by the state-of-the-art model named MFA-Conformer. PMS-Conformer has made the improvements on the acoustic feature extractor, network components and loss calculation module of MFA-Conformer respectively, having the novel and effective acoustic feature extractor and the robust speaker embedding extractor with high generalization?capability. PMS-Conformer is trained on VoxCeleb1&2 dataset, and it is compared with the baseline MFA-Conformer and ECAPA-TDNN, and extensive comparison experiments are conducted on the speaker verification tasks. The experimental results show that on VoxMovies with cross-domain utterances, SITW with long-duration utterances and VoxCeleb-O processed by adding noise to its utterances, the ASV system built with PMS-Conformer is more competitive than those built with MFA-Conformer and ECAPA-TDNN respectively. Moreover, the trainable Params and RTF of the speaker embedding extractor of PMS-Conformer are significantly lower than those of ECAPA-TDNN. All evaluation experiment results demonstrate that PMS-Conformer exhibits good performances in real-time multi-scenarios.

Reference | Related Articles | Metrics

Select

Image-Guided Augmentation Visual Question Answering Model Combined with Contrastive Learning

YANG You, YAO Lu

Computer Engineering and Applications 2024, 60 (7): 157-166. DOI: 10.3778/j.issn.1002-8331.2211-0447

Abstract （41）

PDF（pc）（911KB）（47）

Save

Aiming at two problems of existing attention-based encoder-decoder visual question answering (VQA) models, image-guided augmentation VQA model combined with contrastive learning (IGA-CL) is proposed. One of these two problems is that single-type image feature contains incomplete visual information, another is that existing models rely overly on question guidance. To solve the first problem, the dual-feature visual decoder (DFVD) is proposed. It is based on the Transformer language encoder. After the single image feature is extended into two types：region and grid, visual information is refined through constructing complementary spatial relations based on the relative positions of different type features. To solve the second problem, the vision-guided language decoder (VGLD) is proposed. It twice matches the two decoded image features with the question features. In which, the parallel gated guided-attention (PGGA) is designed to correct adaptively the guiding proportions of different image features to the question. To obtain more similar mutual information, the contrastive learning loss function is introduced during the training process. It can compare the similarity of different modal features in the hidden space during model reasoning. The proposed model achieves 73.82%, 72.49% and 57.44% overall accuracy on the VQA 2.0, COCO-QA and GQA, respectively, which is 2.92 percentage points, 4.41 percentage points and 0.8 percentage points better than MCAN model. Extensive ablation experiments and visualization analysis demonstrate the effectiveness of the proposed model. Experimental results show that the proposed model can obtain more relevant language-vision information and has stronger generalization ability for different types of question samples.

Reference | Related Articles | Metrics

Select

Medical Report Extraction Generation Model Integrated with BioCopy Mechanism

LIU Lan, TAN Hongye

Computer Engineering and Applications 2024, 60 (6): 155-162. DOI: 10.3778/j.issn.1002-8331.2210-0071

Abstract （35）

PDF（pc）（600KB）（40）

Save

Wise information technology of med (WITMED) is a new health care service mode that integrates information technologies such as artificial intelligence. Among them, automatic generation of medical reports is an important task in the field of WITMED. This task generates semi-structured medical reports based on patient self-report and doctor-patient dialogue. The medical report not only contains the chief complaint and other sub parts, but also contains a large number of medical terms from the original text. In view of these characteristics, a summary model integrating extraction and abstraction of BioCopy mechanism is adopted. Firstly, the model extracts key sentences for each sub-part to eliminate the interference of irrelevant information. Then, the BioCopy mechanism is added when generating the medical report to copy the medical terms in the key sentences to ensure the accuracy of the results. The experimental results on CCL 2021 datasets show that this model is superior to the main baseline and has achieved good results.

Reference | Related Articles | Metrics

Select

Deep Neural Network Channel Pruning Compression Method for Filter Elasticity

LI Ruiquan, ZHU Lu, LIU Yuanyuan

Computer Engineering and Applications 2024, 60 (6): 163-171. DOI: 10.3778/j.issn.1002-8331.2210-0420

Abstract （22）

PDF（pc）（713KB）（25）

Save

Deep neural network (DNN) has achieved great success in various fields. Due to its high computing and storage costs, it is difficult to directly deploy them to resource constrained mobile devices. To solve this problem, the importance evaluation of the global filter in the network is studied, and a channel pruning compression method with filter elasticity is proposed to reduce the size of the neural network. Firstly, the method sets the local dynamic threshold between layers to improve the shortage of over pruning in L1 regularization (L1 lasso) sparse training. Then, its output is multiplied by the channel scaling factor to replace the ordinary convolution layer module. The importance of the global filter is defined by the elastic size of the filter. Its values are estimated and ranked by Taylor formula. At the same time, a new iterative pruning framework of the filter is designed to balance the contradiction between the pruning performance and the pruning speed. Finally, the improved L1 regularization training and the importance of the global filter are used to prune the composite channels. VGG-16 is tested on CIFAR-10 using the proposed method, which reduces 80.2% of floating-point operations (FLOPs) and 97.0% of parameter quantities, without significant loss of accuracy, indicating the effectiveness of the method, which can compress neural networks in a large scale, and can be deployed to resource constrained terminal devices.

Reference | Related Articles | Metrics

Select

Research on Angle-Optimised Grasp Detection Algorithm Based on YOLOv5

CHEN Chunchao, SUN Donghong

Computer Engineering and Applications 2024, 60 (6): 172-179. DOI: 10.3778/j.issn.1002-8331.2210-0499

Abstract （45）

PDF（pc）（649KB）（50）

Save

Aiming at the problems that the current robot grasping detection method is too discrete in predicting the grasping angle and the grasping process may produce large off-angle, which reduces the grasping detection accuracy and even leads to grasping failure, an improved robot real-time grasping detection method based on the YOLOv5 neural network model is proposed. Firstly, the grasping frame coordinates and grasping angles are extracted based on the single-stage object detection model YOLOv5. Afterwards, the grasping angles are divided more carefully, while circular smoothing label is introduced to accommodate the periodicity of the angles, links between adjacent angles are established, the YOLOv5 detection head is decoupled, and the loss function is optimized to improve the detection accuracy. Finally, an experimental validation is performed on the Cornell dataset. The experimental results show that the proposed algorithm can better predict the grasping angle and improve the grasping detection accuracy compared with the classical grasping detection methods. The model achieves 97.5% accuracy and 71?FPS detection speed on the Cornell dataset.

Reference | Related Articles | Metrics

Select

Dynamic Dominant Fusion Multimodal Sentiment Analysis Method Based on Autoencoder

YANG Xi, GUO Junjun, YAN Haining, TAN Kaiwen, XIANG Yan, YU Zhengtao

Computer Engineering and Applications 2024, 60 (6): 180-187. DOI: 10.3778/j.issn.1002-8331.2211-0010

Abstract （40）

PDF（pc）（562KB）（51）

Save

In multimodal sentiment analysis, the modality that plays a dominant role in sentiment determination is dynamic. Usually, traditional multimodal sentiment analysis methods regard text modal as a dominant modal, but ignore the change in dominant modal at different moments due to the differences between modalities. Aiming at selecting dominant modal dynamically in each moment, this paper proposes a dynamic dominant fusion multimodal sentiment analysis method based on autoencoder. The method firstly encodes single modalities and obtains multimodal fusion features. And an autoencoder is applied to map them into a shared space. In the space, the dominant modality is selected by correlation between unimodal and fusion modal. Finally, the dominant multimodal information is used to guide multimodal fusion to obtain the multimodal robustness representation. The extensive experiments on the multimodal sentiment analysis benchmark dataset CMU-MOSI demonstrate the effectiveness of the proposed method, which outperform most of the existing state-of-the-art multimodal sentiment analysis methods.

Reference | Related Articles | Metrics

Select

Medical Named Entity Recognition Based on Multi-Feature and Co-Attention

LIU Xinning

Computer Engineering and Applications 2024, 60 (6): 188-198. DOI: 10.3778/j.issn.1002-8331.2211-0094

Abstract （35）

PDF（pc）（707KB）（43）

Save

Aiming at the situation that the accuracy of entity recognition cannot be effectively improved due to the lack of fusion of unique feature information of medical texts in current Chinese medical named entity recognition, and the problem that single attention mechanism affects the effectiveness of entity classification, a Chinese medical named entity recognition method based on multi-feature fusion and co-attention mechanism is proposed. Firstly, the vector representation of the original medical text is obtained by using the pre-trained model, and then the feature vectors of word granularity are obtained by using the bidirectional gated recurrent neural network (BiGRU). Secondly, combined with the distinctive radical features of medical named entities, iterative dilation convolution neural network (IDCNN) is used to extract radical-level feature vectors. Finally, the co-attention network is used to integrate medical vector features to generate double correlation features of <Characters-Radicals> pair, and then conditional random field (CRF) is used to output entity recognition results. The experimental results show that, compared with other entity recognition models, it can achieve higher accuracy, recall and F1 value on the CCKS dataset. At the same time, although the complexity of the recognition model is increased, the performance does not decrease significantly.

Reference | Related Articles | Metrics

Select

Chinese Named Entity Recognition Methods Combined with Entity Boundary Cues

HUANG Rong, CHEN Yanping, HU Ying, HUANG Ruizhang, QIN Yongbin

Computer Engineering and Applications 2024, 60 (6): 199-206. DOI: 10.3778/j.issn.1002-8331.2211-0119

Abstract （42）

PDF（pc）（612KB）（42）

Save

As a basic task in information extraction, named entity recognition (NER) can provide effective support for machine translation, relation extraction and other downstream tasks, and is of great research significance. To tackle the problem of fuzzy entity boundary in Chinese named entity recognition methods, a named entity recognition model combining entity boundary cue is proposed. The model is composed of three modules：boundary detection, cue generation and entity classification. Firstly, the entity boundary detection module is used to identify the entity boundary. Then, the entity span is generated according to the boundary information in the cue generation module, and the text sequence with the boundary cue label is obtained. Through the boundary cue label, the model can perceive the entity boundary in the sentence, and learn the semantic dependence characteristics of the entity boundary and context. Finally, the text sequence with boundary cue tags is employed as the input of entity classification module, and the semantic interaction between tags is enhanced by the Biaffine mechanism, then combined with the joint prediction of multilayer perceptron and Biaffine mechanism as the result of entity recognition. The F1 values of this model on ACE2005 Chinese dataset and Weibo dataset reaches 90.47% and 73.54% respectively, which verifies the effectiveness of the model for Chinese named entity recognition.

Reference | Related Articles | Metrics

Select

Visual Question Answering Research on Joint Knowledge and Visual Information Reasoning

SU Zhenqiang, GOU Gang

Computer Engineering and Applications 2024, 60 (5): 95-102. DOI: 10.3778/j.issn.1002-8331.2209-0456

Abstract （68）

PDF（pc）（740KB）（76）

Save

As a task in the multimodal field, visual question answering requires fusion and reasoning of the features of different modalities, which has important application value. In traditional visual question answering, the answer to the question can be well reasoned only by relying on the visual information of the image. However, pure visual information cannot meet the diverse question-answering needs in real-world scenarios. Knowledge plays an important role in visual question answering and can well assist question answering. Knowledge-based open visual question answering needs to correlate external knowledge to achieve cross-modal scene understanding. In order to better integrate visual information and related external knowledge, a bilinear structure for joint knowledge and visual information reasoning is proposed, and a dual-guided attention module for knowledge representation by image features and question features is designed. Firstly, the model uses the pre-trained vision-language model to obtain the feature representation and visual reasoning information of the question and image, Secondly, the similarity matrix is used to calculate the image object area under the semantic alignment of the question, and then the regional features after the joint alignment of the question features jointly guide the knowledge representation to obtain knowledge reasoning information. Finally, the visual reasoning information and the knowledge reasoning information are fused to get the final answer. The experimental results on the OK-VQA dataset show that the accuracy of the model is 1.97 percentage points and 4.82 percentage points higher than the two baseline methods, respectively, which verifies the effectiveness of the model.

Reference | Related Articles | Metrics

Select

Cross-Domain Face in Vivo Detection of Unilateral Adversarial Network Algorithm

ZENG Fanzhi, WU Chutao, ZHOU Yan

Computer Engineering and Applications 2024, 60 (5): 103-111. DOI: 10.3778/j.issn.1002-8331.2210-0134

Abstract （56）

PDF（pc）（589KB）（65）

Save

In the existing cross-domain face detection algorithms, the feature extraction process is prone to overfitting and lack of feature aggregation, resulting in insufficient generalization. To solve this problem, this paper proposes a unilateral adversarial network algorithm for cross-domain face in vivo detection. Firstly, grouping convolution and improved reciprocal residual structure are fused to replace ordinary convolution to reduce network parameters and enhance the expression ability of face fine-grained features, and an adaptive feature normalization module is introduced, emphasizing the face in vivo information region fade irrelevant background region in the image. Effectively it avoids the overfitting merging of live face information and enhances the ability of face detection from different source domains. Secondly, based on NetVLAD, the channel attention mechanism module is introduced. As a branch of feature aggregation network, the channel attention mechanism module learns the semantic information of local features in different source domains, effectively enhancing the generalization ability of face live information classification in different source domains. Finally, a two-module fusion network is designed to improve the accuracy of cross-domain face detection in unknown scenes. Experimental results on OULU-NPU, CASIA-FASD, MSU-MFSD, and Idiap Replay-Attack data sets show that, the proposes algorithm has good performance in cross-data set tests of O&C&M to I, O&C&I to M, I&C&M to O, and O&M&I to C. Among them, the performance evaluation indexes of O&C&I to M and O&M&I to C have improved the accuracy by 0.99 percentage points and 0.5 percentage points respectively.

Reference | Related Articles | Metrics

Select

Multi-View Representation Model for Aspect-Level Sentiment Analysis

XU Xuefeng, HAN Hu

Computer Engineering and Applications 2024, 60 (5): 112-121. DOI: 10.3778/j.issn.1002-8331.2210-0231

Abstract （60）

PDF（pc）（637KB）（63）

Save

The fine-grained sentiment analysis of user comments for specific aspects is a popular research topic in the field of natural language processing. For the flexibility of comment statements in content expression and syntactic structure, the integrated use of lexical, syntactic and semantic knowledge to enhance the feature representation of comment statements is a major research idea at present. Based on this, a graph convolutional network model for multi-view fusion representation is proposed in this paper. First, the model learns to obtain context-based enhanced representations of comment statements through self-attention and aspect-specific attention. Second, two different representations of comment utterances based on syntax and semantics are obtained through graph convolution operations using syntactic dependency information and word co-occurrence information, respectively. Finally, a hierarchical fusion approach is designed based on obtaining three different view representations to achieve information sharing and complementarity among different view representations by combining and convolving the three representations. Experimental results on five publicly available datasets show that the model achieves better performance than existing models.

Reference | Related Articles | Metrics

Select

Reverse Inference Model for Document-Level Event Extraction

JI Wanting, MA Yuhang, LU Wenyi, WANG Junlu, SONG Baoyan

Computer Engineering and Applications 2024, 60 (5): 122-129. DOI: 10.3778/j.issn.1002-8331.2210-0237

Abstract （45）

PDF（pc）（634KB）（67）

Save

Event extraction aims to detect event types and extract event arguments from unstructured texts. Existing methods still have limitations when dealing with document-level texts. This is because a document-level text may consist of multiple events, and the event arguments that constitute an event are usually scattered across different sentences. To address the above challenges, this paper proposes a reverse inference model for document-level event extraction (RIDEE). Based on the design without trigger words, RIDEE simplifies the document-level event extraction into two sub-tasks, candidate event argument extraction and event triggering inference, to extract event arguments in parallel and detect event types. In addition, this paper designs an event dependency pool for storing historical events, so that the model can make full use of the dependencies between events when processing the multi-event texts. Experimental results on the public dataset show that RIDEE has better performance in document-level event extraction than the existing event extraction models.

Reference | Related Articles | Metrics

Select

Bidirectional Interaction Model for Joint Multiple Intent Detection and Slot Filling

LI Shi, SUN Zhenpeng

Computer Engineering and Applications 2024, 60 (5): 130-138. DOI: 10.3778/j.issn.1002-8331.2210-0271

Abstract （53）

PDF（pc）（578KB）（43）

Save

Intent detection and slot filling are the two major tasks of spoken language understanding, which are highly correlated and are usually trained jointly. As the spoken language understanding task progresses, it has been found that users’ utterances in real-life scenarios often contain multiple intents. However, some joint models can only detect a single intent in user utterances and fail to adequately model the correlation between multiple intents and slots. Since the information of multiple intents in the utterance can guide the slot filling and the slot information can also help the better detection of intents. The Label Bi-Interaction model uses the graph attention network to establish a two-way interaction between intents and slots. Specifically, Label Bi-Interaction model associates two tasks bidirectionally so that the model can explore the relationship between multiple intents and slots, and introduces the label information of the two tasks to enable the model to learn the relationship between utterance context and labels. This improves the accuracy of intent detection and slot filling and optimizes the overall performance of spoken language understanding. Experiments show that the performance of the model on the MixATIS and MixSNIPS two multi-intent datasets has been significantly improved compared to other models.

Reference | Related Articles | Metrics

Select

Personalized Dynamically Ensemble for Alzheimer’s Disease Auxiliary Diagnostics Model

LIANG Haolin, PAN Dan, ZENG An, YANG Baoyao, Xiaowei Song

Computer Engineering and Applications 2024, 60 (5): 139-145. DOI: 10.3778/j.issn.1002-8331.2211-0150

Abstract （34）

PDF（pc）（728KB）（39）

Save

Aiming at the problem that most of the Alzheimer’s disease (AD) classification models do not develop specific strategies for input samples, resulting in the easy neglect of personalized differential information between samples, a novel AD classification model, namely personalized dynamically ensemble convolution neural network (PDECNN), is proposed. Considering the difference in degeneration degree of brain regions between input samples, PDECNN involves an attention-net to evaluate the degeneration degree of each brain region specific to the input sample. Based on the estimated results of the attention-net, a dynamic ensemble strategy is newly designed to select and fuse brain region features for AD identification. In addition, by redesigning the loss function, the problem that the optimal gradient of unselected brain regions cannot be obtained is solved, thus improving the AD classification performance. The experimental results show that compared with AD classification models, the classification accuracy of PDECNN in the AD vs. HC (healthy cognition), MCIc (mild cognitive impairment who will convert to AD) vs. HC, and MCIc vs. MCInc (mild cognitive impairment who will not convert to AD) experiments can be increased by 4%, 11%, and 8%, respectively. The experimental results also find that the degenerate brain regions identified by the PDECNN correlate with AD’s clinical manifestations.

Reference | Related Articles | Metrics

Select

Prompt-Learning Inspired Approach to Unsupervised Sentiment Style Transfer

CAI Guoyong, LI Anqing

Computer Engineering and Applications 2024, 60 (5): 146-155. DOI: 10.3778/j.issn.1002-8331.2211-0317

Abstract （32）

PDF（pc）（929KB）（44）

Save

Text style transfer is the task of transferring text generation with certain desired style properties while preserving the original text content. In order to improve the transfer quality under non-parallel style corpus, this paper proposes a new method to guide the fill-mask model to rewrite the sentence into the target style. Overall, this approach is based on the delete-retrieve-generate style transfer framework, but employs a large unsupervised pre-trained language model and Transformer architecture. According to the working principle of Transformer, firstly, the method of filtering style attributes from the source sentence is improved, and then the internal knowledge of the pre-trained model is mined by the prompt learning method to generate the target style words. Experiments on two sentiment benchmark datasets show that the method outperforms existing editing methods, with an average improvement of more than 14% in relative scores on the comprehensive metrics.

Reference | Related Articles | Metrics

Select

Data Reconstruction Based on Quantum Generative Adversarial Networks

JIANG Yida, WANG Mingming

Computer Engineering and Applications 2024, 60 (5): 156-164. DOI: 10.3778/j.issn.1002-8331.2211-0363

Abstract （30）

PDF（pc）（1188KB）（31）

Save

Data reconstruction using neural networks is a very important research topic in the field of artificial intelligence. Generative adversarial network (GAN), as a popular algorithm of artificial intelligence in recent years, has a good performance in completing data reconstruction tasks. As a new computing mode that can accelerate classical computing, quantum computing is constantly merging with classical artificial intelligence algorithms. Among them, pure quantum generative adversarial network (QGAN) has a good performance in image related tasks. However, since the fitting ability in the quantum model still needs to be improved, this paper proposes a hybrid generative confrontation network (Q-CGAN) based on the GAN framework to realize the data reconstruction task. The framework exploits classical nonlinearities to improve fitting performance and quantum properties to provide quantum speedups. Using the MNIST handwritten data set to compare and verify the reconstruction effect of the hybrid model in this network, the results show that Q-CGAN has better performance in the data reconstruction process than pure quantum generators. In addition, the effect of using different quantum encoding schemes and different parameterized quantum circuits in the hybrid model on the data reconstruction effect is also studied.

Reference | Related Articles | Metrics

Select

Oversampling Method for Imbalanced Data Using Credible Counterfactual

GAO Feng, SONG Mei, ZHU Yi

Computer Engineering and Applications 2024, 60 (5): 165-171. DOI: 10.3778/j.issn.1002-8331.2211-0413

Abstract （47）

PDF（pc）（494KB）（34）

Save

A new method for imbalanced data sets on counterfactual is proposed (counterfactual，CF), and further removes the “incredibility” composite samples, which aims to solve the problem of the traditional sampling method that cannot make full use of the data set information. Its core idea is to synthesize new samples based on the original instance features of the dataset. Compared with the traditional oversampling interpolation method, it can fully mine the boundary decision information in the data, so as to provide more useful information for the classifier and improve the classification performance. A lot of comparative experiments have been carried out on 9 KEEL and UCI unbalanced datasets, 5 different classifiers (SVM, DT, Logistic, RF, AdaBoost) and 4 traditional oversampling methods (SMOTE, B1-SMOTE, B2-SMOTE, ADASYN). The results show that the algorithm has higher AUC value、F1 value and G-mean value, which can effectively solve the class imbalance problem.

Reference | Related Articles | Metrics

Select

Method for Generating Summary of Judgment Documents Based on Trial Logic Steps

YU Shuai, SONG Yumei, QIN Yongbin, HUANG Ruizhang, CHEN Yanping

Computer Engineering and Applications 2024, 60 (4): 113-121. DOI: 10.3778/j.issn.1002-8331.2209-0142

Abstract （75）

PDF（pc）（2408KB）（63）

Save

Judicial summary oriented to judgment documents is the key technology to improve the analytical ability of judgment documents. As the carrier of the trial activities, the judgment documents accurately present the trial logic of the case. However, the current abstract methods only focus on the serialization information of the judgment documents, ignore the logical structure, and can not effectively solve the problems of too long texts and redundant information. A judgment document summary generation method based on the trial logic steps is proposed. The method of “extraction + generation” is adopted. The extraction part uses the multi-label classification method to extract four sentence sets of “type, claim, fact and result” according to the logic steps of the people's court. The generation part gets the summary from the fine-tuned T5-PEGASUS model. And the input text of the “fact” part is denoised by using the maximum similarity matching algorithm based on internal knowledge, which further improves the summary effect. The experimental results show that, compared with the mainstream pointer-generated network summary model, the proposed method improves the F1 index of ROUGE-1, ROUGE-2 and ROUGE-L by 17.99 percentage points, 21.24 percentage points and 21.86 percentage points, respectively. This shows that introducing logical structure into the judicial summarization can improve the performance of the task.

Reference | Related Articles | Metrics

Select

Speech Emotion Recognition for Imbalanced Datasets

ZHANG Huiyun, HUANG Heming

Computer Engineering and Applications 2024, 60 (4): 122-132. DOI: 10.3778/j.issn.1002-8331.2209-0099

Abstract （69）

PDF（pc）（3523KB）（49）

Save

The sample balance is crucial for machine learning. The importance of certain classes may be higher than its number on the imbalanced datasets. This paper studies the imbalanced datasets for speech emotion recognition. Firstly, the imbalanced baseline datasets EMODB and IEMOCAP are augmented with different signal-to-noise?ratios, and the datasets EMODBM and IEMOCAPM are constructed. Secondly, six techniques namely SMOTE, RandomOverSampler, SMOTEENN, ADASYN, TomekLinks and SMOTETomek are adopted to resample the baseline datasets, and the augmented datasets are constructed to achieve the category balance. Thirdly, 21-dimensional low-level descriptor features are extracted from the baseline datasets and the augmented datasets. Finally, a novel model MA-CapsNet is proposed to validate the effectiveness of the resampling techniques. The results show that all types of emotion samples are basically balanced after resampling, which makes the learning of the model MA-CapsNet fairer. In addition, the model MA-CapsNet has better robustness on the resampling datasets.

Reference | Related Articles | Metrics

Select

Cross-Modality Person Re-identification Combined with Data Augmentation and Feature Fusion

SONG Yu, WANG Banghai, CAO Ganggang

Computer Engineering and Applications 2024, 60 (4): 133-141. DOI: 10.3778/j.issn.1002-8331.2209-0120

Abstract （78）

PDF（pc）（2285KB）（88）

Save

The difficulty of visible-infrared person re-identification problem lies in the large modal difference between images. Most existing methods alleviate the modal difference by generating fake images through generative adversarial networks or extracting modal shared features on the original image. However, training a generative adversarial network consumes a lot of computational resources and generates fake images that are prone to introduce noise, and extracting modal shared features can also result in the loss of important differentiated features. To address these problems, a new cross-modality person re-identification network is proposed. Firstly, automatic data augmentation is used to improve model robustness. Then, instance regularization is used in the network to reduce modal differences. Finally, the pedestrian features of different scales extracted by each layer of the network are organically fused, and the fused features contain more differentiated features related to pedestrian identity. The proposed method achieves Rank-1/mAP of 69.47%/65.05% in the all-search mode of the SYSU-MM01, and Rank-1/mAP of 85.73%/77.77% in the visible to infrared modes of the RegDB, respectively. The experimental results have a significant improvement effect.

Reference | Related Articles | Metrics

Select

Efficient Cross-Domain Transformer Few-Shot Semantic Segmentation Network

FANG Hong, LI Desheng, JIANG Guangjie

Computer Engineering and Applications 2024, 60 (4): 142-152. DOI: 10.3778/j.issn.1002-8331.2209-0156

Abstract （78）

PDF（pc）（2740KB）（73）

Save

Few-shot semantic segmentation aims at only using several labeling samples to learn target features and complete the semantic segmentation task. The main problems in mainstream research are low training efficiency, meta training and meta testing in the same data domain. For this task, this paper proposes an efficient, cross-domain few-shot semantic segmentation network based on Transformer: SGFNet. In the encoding layer, use the shared weight MixVisionTransformer to build a siamese network to extract the support set and query set image features. In the relationship calculation layer, calculate the Hadamard product of the support set image feature vector and its corresponding mask to extract the target feature maps, and calculate the relationship between them and the image features of the query set. In the decoder layer, improve the MLP decoder and propose a residual decoder to decode the features of different hierarchies to obtain the final segmentation result. Experiments show that the model only needs to use a single 3090 GPU on the FSS-1000 dataset for training 1.5~4.0 h to get the optimal result 1-shot mIoU 87.0% on PASCAL-5i and the COCO-20i dataset perform cross-domain tests to achieve non-cross-domain effects, the 1-shot mIoU is 60.4% and 33.0%, respectively, proving that the model is efficient and cross-domain.

Reference | Related Articles | Metrics

Select

Joint Dual-Dimensional User Scheduling for Adaptive Federated Edge Learning

ZHANG Jiuchuan, PAN Chunyu, ZHOU Tianyi, LI Xuehua, DING Yong

Computer Engineering and Applications 2024, 60 (4): 153-162. DOI: 10.3778/j.issn.1002-8331.2209-0459

Abstract （33）

PDF（pc）（2402KB）（31）

Save

Federated edge learning does not need to transmit local data, which greatly reduces the pressure on the uplink while protecting user privacy. The federated edge learning uses the local dataset to train the local model through the intelligent edge device and then uploads the model parameters to the central server; the central server aggregates the local model parameters uploaded locally to form a global model and updates it, and then sends the updated model to the intelligent edge device to start a new iteration. However, the local model accuracy and local model training time will have a significant impact on the global model aggregation and model update process. Therefore, an adaptive dynamic batch gradient descent strategy is firstly proposed, which can automatically adjust the batch size extracted by gradient descent during the local model training process, and optimize the local model accuracy and convergence speed of federated learning. Next, aiming at the non-IID characteristics of user data, an adaptive dynamic batch gradient descent algorithm that combines two-dimensional user scheduling strategies is designed, and two-dimensional constraints are imposed by convergence time and data diversity. After training and testing on the MNIST dataset, fashion MNIST dataset and CIFAR-10 dataset, the algorithm effectively reduces the aggregation waiting time and further improves the global model accuracy and convergence speed. Compared with the gradient descent algorithm with fixed batches of 64, 128, and 256, the global model accuracy of this algorithm is increased by 32.4%, 45.2%, and 87.5% when running for 100 seconds.

Reference | Related Articles | Metrics

Select

Extreme Multi-Label Text Classification Based on Balance Function

CHEN Zhaohong, HONG Zhiyong, YU Wenhua, ZHANG Xin

Computer Engineering and Applications 2024, 60 (4): 163-172. DOI: 10.3778/j.issn.1002-8331.2209-0472

Abstract （71）

PDF（pc）（2723KB）（46）

Save

Extreme multi-label text classification is a challenging task in the field of natural language processing. In this task, there is a long-tailed distribution situation of labeled data. In this situation, model has a poor ability to learn tail labels classification, which results the overall classification effect is not good. In order to address the above problems, an extreme multi-label text classification method based on balance function is proposed. Firstly, the BERT pre-training model is used for word embedding. Further, the concatenated output of the multi-layer encoder in the pre-trained model is used as the text vector representation to obtain richer text semantic information and improves the model convergence speed. Finally, the balance function is used to assign different attenuation weights to the training losses of different prediction labels, which improves the learning ability of the method on tail label classification. The experimental results on Eurlex-4K and Wiki10-31K datasets show that the evaluation indicators P@1, P@3 and P@5 respectively reach 86.95%, 74.12%, 61.43% and 88.57%, 77.46% and 67.90%.

Reference | Related Articles | Metrics

Select

Few-Shot Scene Classification with Attention Mechanism in Remote Sensing

ZHANG Duona, ZHAO Hongjia, LU Yuanyao, CUI Jian, ZHANG Baochang

Computer Engineering and Applications 2024, 60 (4): 173-182. DOI: 10.3778/j.issn.1002-8331.2301-0012

Abstract （80）

PDF（pc）（2555KB）（68）

Save

Remote sensing scene classification is a hot research topic in the field of computer vision, and it is of great significance to semantic understanding of remote sensing images. At present, remote sensing scene classification methods based on deep learning occupy a dominant position in this field. However, it suffers from the lack of samples and poor model generalization ability in actual application scenarios. Therefore, this paper proposes a few-shot remote scene classification method based on attention mechanism, and designs a structure of dual-branches similarity measurement. This method is based on the meta-learning training strategy to divide the dataset into tasks. At the meantime, the input images are divided into blocks in order to preserve the feature distribution in the remote sensing image. Then the lightweight attention module is introduced into the feature extraction network to reduce the risk of overfitting and ensure the acquisition of discriminative features. Finally, based on earth mover’s distance (EMD), a dual-branches similarity measurement module is added to improve the discriminative ability of the classifier. The results show that compared with the classic small-sample learning method, the few-shot remote scene classification method proposed in this paper can significantly improve the classification performance.

Reference | Related Articles | Metrics

Select

Improving Detection and Positioning of Insulators in YOLO v7

ZHANG Jianrui, WEI Xia, ZHANG Linxuan, CHEN Yannan, LU Jie

Computer Engineering and Applications 2024, 60 (4): 183-191. DOI: 10.3778/j.issn.1002-8331.2306-0094

Abstract （101）

PDF（pc）（2604KB）（92）

Save

This paper aims to address the problems of low accuracy and high leakage rate due to the influence of different insulator sizes and background interference in the target detection task of power systems. Firstly, a convolutional block attention module (CBAM) is added to the YOLO v7 backbone network to make the network model pay more attention to the insulator features from both channel and space aspects and reduce the leakage rate in insulator detection. Secondly, a concentrated feature pyramid (CFP) is added to the deeper layer of the network model to allow the information exchange and aggregation of feature maps at different scales, thus obtaining more comprehensive insulator features and improving insulator detection accuracy. Finally, the k-means algorithm is used to cluster the preselected frames to obtain the most suitable insulator preselected frame size. The experimental results show that the improved YOLO v7 network model has a detection mAP (mean average precision) of 96.2%, a precision of 90.8%, and a recall of 93.8%. The improved method in this paper has a wide application prospect in the insulator detection of power systems.

Reference | Related Articles | Metrics

Content of Pattern Recognition and Artificial Intelligence in our journal