Content of Pattern Recognition and Artificial Intelligence in our journal

        Published in last 1 year |  In last 2 years |  In last 3 years |  All
    Please wait a minute...
    For Selected: Toggle Thumbnails
    Temporal Knowledge Graph Reasoning with Graph Reconstruction
    XU Zhihong, ZHANG Tianrun, WANG Liqin, DONG Yongfeng
    Computer Engineering and Applications    2024, 60 (9): 181-187.   DOI: 10.3778/j.issn.1002-8331.2212-0197
    Abstract20)      PDF(pc) (2822KB)(25)       Save
    To address the problem that most existing temporal knowledge mapping algorithms are based on static knowledge mapping snapshot sequences and cannot adequately capture fine-grained temporal features, a graph reconstruction for temporal knowledge reasoning (GRTKR) model based on mapping reconstruction is designed. The model completes the temporal knowledge graph reconstruction by sampling the temporal neighbourhood of entities, and combines the explicit temporal features provided by the temporal encoder with the implicit temporal features provided by the neighbourhood feature aggregator to improve the modelling capability of the temporal data. Experiments on the temporal knowledge graph datasets ICEWS14, ICEWS05-15, and YAGO11K validate the effectiveness of the method and show significant improvements in MRR, Hits@1, Hits@3, and Hits@10 evaluation metrics compared to the mainstream baseline model.
    Reference | Related Articles | Metrics
    Short Text Classification Combined with Hyperbolic Graph Attention Networks and Labels
    SONG Jianping, WANG Yi, SUN Kaiwei, LIU Qilie
    Computer Engineering and Applications    2024, 60 (9): 188-195.   DOI: 10.3778/j.issn.1002-8331.2212-0335
    Abstract16)      PDF(pc) (3422KB)(25)       Save
    In view of the lack of robustness and expression ability caused by the existing methods’ failure to comprehensively consider the importance of text hierarchy and labels for text feature learning in text classification tasks, a short text classification algorithm L-HGAT based on hyperbolic attention network is proposed. Considering the compatibility between the hierarchical structure of the text and the tree similarity features of the hyperbolic space, the text is embedded into the hyperbolic space with negative constant curvature, making full use of the powerful expression ability of hyperbolic popular representation. Then the hyperbolic graph attention network is designed, which combines node features and edge features to enhance the ability to aggregate key local information in the text. Finally, the interaction function between label and text based on the geodesic distance in hyperbolic space is used to further guide the text feature learning, so as to improve the classification performance. Experimental results show that L-HGAT significantly outperforms existing research methods on benchmark datasets and can effectively improve the model performance and complete the text classification task better.
    Reference | Related Articles | Metrics
    Chinese Long Text Classification Model Based on BERT Fused Chinese Input Methods and BLCG
    YANG Wentao, LEI Yuqi, LI Xingyue, ZHENG Tiancheng
    Computer Engineering and Applications    2024, 60 (9): 196-202.   DOI: 10.3778/j.issn.1002-8331.2212-0357
    Abstract11)      PDF(pc) (2785KB)(20)       Save
    The existing Chinese long text classification models do not take into account the Chinese feature information such as phonetic and morphological, so they cannot fully represent Chinese semantic information. Meanwhile, the occurrence of some sentences containing many information which is either unrelated to the target topic or related to other topics, leads to misjudgment of the classifying model. In order to solve the problem, A Chinese long text classifying model based on CIMBERT (BERT fused Chinese input methods) and BLCG (BiLSTM fused CNN with gate) is proposed. Firstly, the representations of text vector are carried out by using the BERT model with adopting Chinese input methods. As for the input vector representations of BERT, Pinyin and Wubi which are widely used for Chinese character input methods, are applied to enhance the semantic information of Chinese characters. Furthermore, BLCG is constructed to extract the whole features of texts by means of utilizing LSTM (long short-term memory) method to obtain the global features and CNN (convolutional neural network) method to acquire the local features. The gating mechanism of BLCG can dynamically combine both global features and local features to overcome the classifying model faults owing to unable to identify unrelated topics of texts. Finally, proposed method is tested on THUCNews datasets and Sogou datasets. The results of the experiment show that the classification accuracy is 97.63%, 95.43% and F1-score is 97.68%, 95.49% respectively, which can indicate the purposed model is superior to other text classifying models to some extent.
    Reference | Related Articles | Metrics
    Discourse-Level Topic Segmentation Model with Multi-Level Information Enhanced Heterogeneous Graphs Network
    ZHANG Yangning, ZHU Jing, DONG Rui, YOU Zeshun, WANG Zhen
    Computer Engineering and Applications    2024, 60 (9): 203-211.   DOI: 10.3778/j.issn.1002-8331.2212-0363
    Abstract13)      PDF(pc) (3414KB)(10)       Save
    Topic segmentation is a basic task in the field of natural language processing, which divides the text into several semantically related text blocks according to the principle of semantic correlation. Nevertheless, the existing topic segmentation models are insufficient to extract the deep semantic information of sentences and further ignore the hierarchical information and contextual interaction in the discourse. To solve the above problems, this paper proposes a discourse-level topic segmentation model MHG-TS that enhances heterogeneous graphs through the multi-level information. MHG-TS constructs the network of heterogeneous graphs from the sentences and keywords in the discourse, adopts the pre-trained language model BERT to capture the deep semantic features of the nodes in the graph. At the level of first-order neighborhood, the model uses the graph attention mechanism to assign more weight to the semantic association nodes, which enhances the information interaction of semantic association nodes in the first-order neighborhood. At the level of keyword nodes, it adopts the information of keywords to enforce the representation of semantic features. At the level of high-order neighborhood, it adopts the keyword nodes as intermediaries to build the cross-sentence information interaction in the high-order neighborhood and to enrich the non-sequential relationship between sentence nodes, thus the sentence representations containing global semantic information is realized finally by integrating with multi-level information. Compared with the state-of-the-art model, the average values of MHG-TS’s performance of three evaluation indexes on many datasets increase by 3.08%, 2.56% and 5.92% respectively and the best experimental effects are obtained.
    Reference | Related Articles | Metrics
    Aspect-Level Sentiment Analysis Based on Location-Enhanced Word Embeddings and GRU-CNN Model
    TAO Linjuan, HUA Gengxing, LI Bo
    Computer Engineering and Applications    2024, 60 (9): 212-218.   DOI: 10.3778/j.issn.1002-8331.2212-0375
    Abstract13)      PDF(pc) (2790KB)(23)       Save
    Aspect-level sentiment analysis aims to judge the emotional attitude of a specific aspect-level word according to the given context. The core problem is how to accurately represent the context of the aspect word. Different from the existing research which mainly focuses on the improvement of attention mechanism, this paper focuses on two aspects: word representation and context encoding. In terms of word representation, the location-enhanced word representation is obtained through the BERT model and position measurement formula. In terms of context encoding, GRU-CNN network is used to extract semantic features of text. The experiments on the SemEval2014 Task4 dataset show that the accuracy of the proposed model on Restaurant and Laptop datasets respectively reaches to 85.54% and 80.35%, which proves the efficiency of the proposed model.
    Reference | Related Articles | Metrics
    Attribute Distillation for Zero-Shot Recognition
    LI Houjun, WEI Boquan
    Computer Engineering and Applications    2024, 60 (9): 219-227.   DOI: 10.3778/j.issn.1002-8331.2212-0382
    Abstract12)      PDF(pc) (8175KB)(15)       Save
    Zero-shot recognition is one of the most challenging tasks in the field of computer vision. The key problem is how to learn stable and transferable knowledge from the seen class. In order to increase the accuracy of zero-shot recognition, this paper carefully investigates the issue of zero-shot recognition and develops a straight forward and efficient attribute-distillation classifier based on the notion of knowledge distillation. It is consistent with how people generally understand things. It begins by obtaining extensive and precise visual features from the large model Vision Transformer, then uses the attribute idea to extract the attribute knowledge of objects before transforming to the task of classifying unseen classes. Public dataset experiments demonstrate that the proposed method has produced results that are competitive. Its recognition accuracy is slightly below that of the most recent attribute-guided algorithm, but it is still better than other conventional approaches, and its simple recognition architecture can achieve fast processing speed. Nevertheless, this research also makes the point that decreasing the sparsity of attribute descriptions and increasing multi-view high-definition photos will contribute to an increase in zero-shot recognition accuracy.
    Reference | Related Articles | Metrics
    Text Kernel Reconstruction and Expansion for Arbitrary Shape Text Detection
    DENG Shengjun, CHEN Niannian
    Computer Engineering and Applications    2024, 60 (9): 228-236.   DOI: 10.3778/j.issn.1002-8331.2301-0074
    Abstract10)      PDF(pc) (6663KB)(12)       Save
    Segmentation-based methods approaches for pixel-level text prediction in natural scenes have demonstrated significant improvement in the detection of arbitrary shape text. However, the separation of adjacent text remains a challenge in text detection. One common method for addressing this issue involves the use of text kernels, which are obtained by shrinking the annotation boundaries, to separate adjacent instances. While this approach is effective in certain scenarios, it discards a significant amount of information outside the text kernel, which can degrade the performance of segmentation-based text detection methods. To address this limitation, a text kernel reconstruction algorithm is proposed that postpones the generation of text kernels to the post-processing stage. The proposed approach utilizes the direction field predicted by the network to inwardly contract text instances, resulting in the formation of text kernels. Additionally, a text kernel expansion algorithm is proposed to restore full text instances from the resulting text kernels. Experiments on the Total-Text, CTW-1500, and MSRA-TD500 datasets show that the proposed method achieves similar or superior detection performance compared to the state-of-the-art (88.66%, 87.28%, and 90.65% respectively).
    Reference | Related Articles | Metrics
    Chinese Short Text Classification with Hybrid Features and Multi-Head Attention
    JIANG Jielin, ZHU Yongwei, XU Xiaolong, CUI Yan, ZHAO Yingnan
    Computer Engineering and Applications    2024, 60 (9): 237-243.   DOI: 10.3778/j.issn.1002-8331.2302-0396
    Abstract23)      PDF(pc) (2833KB)(22)       Save
    Traditional short text classification methods have two shortcomings:they cannot fully represent the semantic information of the text, and they cannot effectively extract and integrate the global and local information of the text. Based on this, a Chinese short text classification with hybrid features and multi-head attention (HF-MHA) is proposed. The method uses a pre-trained model to calculate the character-level and word-level vector representations of Chinese short texts, to obtain a more comprehensive text feature vector representation. Then it adopts a multi-head attention mechanism to capture the dependency relationships in the text sequence, to improve the semantic understanding of the text. It uses a convolutional neural network to extract the features of the two vector representations separately, and integrates them into a feature vector, to integrate the global and local information of the text. Finally, it obtains the classification result through the output layer. Experiments on three public datasets show that HF-MHA can effectively improve the performance of Chinese short text classification.
    Reference | Related Articles | Metrics
    Robotic Actions and Strategy Demonstration Learning Method for Constructing Primitive Library Ideas
    LI Tiejun, LIU Jiaqi, LIU Jinyue, JIA Xiaohui
    Computer Engineering and Applications    2024, 60 (8): 90-98.   DOI: 10.3778/j.issn.1002-8331.2211-0261
    Abstract42)      PDF(pc) (1099KB)(57)       Save
    In order to solve the problems of demonstration data optimization, action and task strategy storage and call in the process of robot demonstration learning, a demonstration learning method based on primitive library is proposed. Action learning uses experts to drag the manipulator to perform actions to obtain demonstration data. Gaussian mixture model and Gaussian mixture regression are used to improve the data quality, and the final demonstration data is converted into the weight value of the basis function by the dynamic motion primitive algorithm. Strategy learning creates task steps as action primitives, adds the obtained weight value to the primitives, builds the primitive business card containing task execution strategy, and forms the primitive library to complete storage. When executing tasks, the primitives are sequentially called from the primitive library. YOLOv5 target detection network and AlexNet image classification network are used to detect target information to match actions and generalize new actions with original action characteristics. This method realizes learning actions and strategy storage from the demonstration, and combining appropriate actions to complete tasks according to actual goals. According to the experiment of steel bar binding scene, 5 action primitives are created, 10 basic actions are learned through expert teaching, the robot successfully completes the lashing task at the intersection of horizontal and vertical reinforcement by using the action primitive library.
    Reference | Related Articles | Metrics
    E-TUP:Joint Knowledge Graph Learning Recommendation Method Incorporating E-CP and TUP
    ZHAO Bo, WANG Yujia, NI Ji
    Computer Engineering and Applications    2024, 60 (8): 99-109.   DOI: 10.3778/j.issn.1002-8331.2211-0464
    Abstract40)      PDF(pc) (635KB)(71)       Save
    At present, most of the methods to introduce knowledge graphs into recommendation systems only introduce known surface knowledge graph entities, without predicting and mining the intrinsic relationships of the graphs, and thus cannot exploit the hidden relationships in the knowledge graphs. In this paper, the joint learning recommendation model E-TUP (enhance towards understanding of user preference) is proposed to address the above problem, and E-CP (enhance canonical polyadic) is used to complement the knowledge graph and deliver the complete information. A storage space negative sampling method is used to store and update high-quality negative triples with the training process to improve the quality of negative triples in the knowledge graph complementation. Experimental results on link prediction show that the storage-space approach improves the link prediction accuracy of the E-TUP model by up to 10.3% compared to existing models. Recommendation experiments on the MovieLens-1m and DBbook2014 datasets achieve the best results on several evaluation metrics, achieving up to 5.5% improvement, indicating that E-TUP can effectively exploit the hidden relationships in the knowledge graph to improve recommendation accuracy. Finally, the results of the recommendation experiments based on automotive maintenance data show that E-TUP can effectively recommend relevant knowledge.
    Reference | Related Articles | Metrics
    Improved Deeplabv3+ Crop Classification Method Based on Double Attention Fusion
    GUO Jin, SONG Tingqiang, SUN Yuanyuan, GONG Chuanjiang, LIU Yalin, MA Xinglu, FAN Haisheng
    Computer Engineering and Applications    2024, 60 (8): 110-120.   DOI: 10.3778/j.issn.1002-8331.2211-0468
    Abstract48)      PDF(pc) (850KB)(63)       Save
    In recent years, convolutional neural networks (CNN) have made new progress in crop classification research, but they have shown some limitations in modeling long-term dependence, and there are deficiencies in capturing the global characteristics of crops. In view of the above problems, Transformer is introduced into the Deeplab v3+ model, and a parallel branch structure for crop classification of drone images, the DeepTrans (Deeplab v3+ with Transformer) model is proposed. DeepTrans combines Transformer and CNN in a parallel way, which is conducive to the effective capture of global and local features. Transformer is introduced to enhance the remote dependence of information in the image and improve the extraction ability of crop global information. Channel attention mechanism and spatial attention mechanism are added to enhance the sensitivity of Transformer to channel information and the ability of ASPP (aerospace spatial pyramid pooling) to capture crop spatial information. The experimental result shows that the MIoU index of the DeepTrans model can reach 0.812, which is 3.9% higher than that of the Deeplab v3+ model. The accuracy of the model in the classification of five crops has been improved. For sugarcane, corn and banana which are easy to be wrongly classified, their IoU has been increased by 2.9%, 4.7% and 13% respectively. It can be seen that DeepTrans model has a better segmentation effect in the internal filling and global prediction of crop classification images, which is helpful to monitor the planting structure and scale of farmland crops more timely and accurately.
    Reference | Related Articles | Metrics
    Approximate Markov Blanket Feature Selection Method Based on Lasso Fusion
    LIU Ming, DU Jianqiang, LI Zhiqin, LUO Jigen, NIE Bin, ZHANG Mengting
    Computer Engineering and Applications    2024, 60 (8): 121-130.   DOI: 10.3778/j.issn.1002-8331.2212-0094
    Abstract28)      PDF(pc) (597KB)(26)       Save
    In feature selection, approximate Markov blankets are often used to judge redundant features, but the redundant features obtained are not identical. Therefore, when using approximate Markov blankets directly to delete redundant features, there may be situations that may lead to information loss and affect model accuracy. Therefore, an approximate Markov blanket feature selection method based on Lasso fusion for high-dimensional small sample data of traditional Chinese medicine metabonomics is proposed. The method is divided into two stages. In the first stage, irrelevant features are filtered by analyzing the correlation of features with the maximum information coefficient. In the second stage, approximate Markov blankets are used to construct similar feature groups, Lasso is used to evaluate the influence of features in similar feature groups, and redundant features are removed iteratively. The experimental results show that the algorithm can reduce the loss of useful information, remove irrelevant features and redundant features, and improve the accuracy and stability of the model.
    Reference | Related Articles | Metrics
    Cross-Modal Re-Identification Light Weight Network Combined with Data Enhancement
    CAO Ganggang, WANG Banghai, SONG Yu
    Computer Engineering and Applications    2024, 60 (8): 131-139.   DOI: 10.3778/j.issn.1002-8331.2212-0100
    Abstract42)      PDF(pc) (714KB)(68)       Save
    Among the existing cross modal re-identification methods, the research on lightweight network is less. Considering the requirement of hardware deployment for lightweight network, a new cross modal re-identification lightweight network is proposed. Based on Osnet ,the feature extractor and feature embedder are split. At the same time, data enhancement operations are used to maximize the use of limited data sets to improve network robustness, and the hard triplet loss is improved to further reduce the computation and reduce the difference between modals, so as to improve the accuracy of network identification. The network is lightweight, simple in structure and remarkable in effect. In the all search mode of SYSU-MM01 dataset, the rank-1/mAP of the proposed method reaches 65.56%,61.36% respectively, and the number of parameters is only 1.92×106.
    Reference | Related Articles | Metrics
    Model Robustness Enhancement Algorithm with Scale Invariant Condition Number Constraint
    XU Yangyu, GAO Baoyuan, GUO Jielong, SHAO Dongheng, WEI Xian
    Computer Engineering and Applications    2024, 60 (8): 140-147.   DOI: 10.3778/j.issn.1002-8331.2212-0114
    Abstract24)      PDF(pc) (605KB)(22)       Save
    Deep neural networks are vulnerable to adversarial examples, which has been threatening their application in safety-critical scenarios. Based on the explanation that adversarial examples arise from the highly linear behavior of neural networks, a model robustness enhancement algorithm based on scale-invariant condition number constraint is proposed. Firstly, all weight matrices are used to calculate their norms during the adversarial training process, and the scale-invariant constraint term is obtained through the logarithmic function. Secondly, the scale-invariant condition number constraint item is incorporated into the outer framework of adversarial training optimization, and the condition number value of all weight matrices are iteratively reduced through backpropagation, thereby performing linear transformation of the neural network in a well-conditioned high-dimensional weight space, to improve robustness against adversarial perturbations. This algorithm is suitable for visual models of both convolution and Transformer architectures. It can not only significantly improve the robust accuracy against white-box attacks such as PGD and AutoAttack, but also effectively enhance the adversarial robustness of defending against black-box attack algorithms including square attack. Incorporating the proposed constraint during adversarial training on Transformer-based image classification model, the condition number value of weight matrices drops by 20.7% on average, the robust accuracy can be increased by 1.16?percentage points when defending against PGD attacks. Compared with similar methods such as Lipschitz constraints, the proposed method can also improve the accuracy of clean examples and alleviate the problem of low generalization caused by adversarial training.
    Reference | Related Articles | Metrics
    Algorithm Research Based on Multi-Feature Fusion of EEG Signals with Convolutional Neural Networks
    SONG Shilin, ZHANG Xuejun
    Computer Engineering and Applications    2024, 60 (8): 148-155.   DOI: 10.3778/j.issn.1002-8331.2212-0301
    Abstract38)      PDF(pc) (707KB)(51)       Save
    In order to address the issue of low classification accuracy in motor imagery of electroencephalogram (EEG) signals, a feature extraction algorithm based on sample entropy and common spatial pattern (CSP) feature fusion has been proposed. The algorithm initially performs wavelet packet decomposition on the raw EEG signal, selecting the components containing μ and β rhythms for reconstruction. Subsequently, the sample entropy and CSP features of the reconstructed signal are separately extracted. These two features are then fused to create a new feature vector which is recognized using a one-dimensional convolutional neural network designs in the paper, to obtain the classification result. The proposes method achieves a classification accuracy of 91.66% on the BCI Dataset III in 2003 and an average classification accuracy of 85.29% on the BCI Dataset A in 2008. Comparing with multi-feature fusion algorithms proposed in recent literature, the accuracy is improved by 7.96 percentage points.
    Reference | Related Articles | Metrics
    Joint Entity Relation Extraction Model Based on Interactive Attention
    HAO Xiaofang, ZHANG Chaoqun, LI Xiaoxiang, WANG Darui
    Computer Engineering and Applications    2024, 60 (8): 156-164.   DOI: 10.3778/j.issn.1002-8331.2301-0154
    Abstract39)      PDF(pc) (609KB)(43)       Save
    Entity relationship triples extraction effect has a direct impact on the construction of knowledge graphs in the later stage. The traditional pipeline and joint extraction models do not effectively model the semantic features at sentence level and relationship level, which leads to the lack of model performance. To this end, a joint entity and relation extraction model RSIAN that fuses the semantic features at the sentence level and relation level is proposed, which learns the higher-order semantic associations at the sentence level and relation level through an interactive attention network to enhance the interaction between sentences and relations and assist the model in extraction decisions. The precision, recall, and F1 values of the Chinese tourism dataset (TDDS) constructs in this paper are 0.872, 0.760, and 0.812, respectively, all of which outperform the current mainstream model. To further validate the performance of the model on joint extraction in English, experiments are conducted on the publicly available English datasets NYT and Webnlg. The F1 values of the model compared to the baseline RSAN model are increased by 0.014 and 0.013, respectively, and this model also achieves better performance than the baseline model in the analysis experiments of overlapping triads.
    Reference | Related Articles | Metrics
    Bi-Bi-Modality with Bi-Gated Fusion in Multimodal Sentiment Analysis
    LIU Qingwen, Mairidan·Wushouer, Gulanbaier·Tuerhong
    Computer Engineering and Applications    2024, 60 (8): 165-172.   DOI: 10.3778/j.issn.1002-8331.2302-0088
    Abstract36)      PDF(pc) (567KB)(45)       Save
    In order to balance the uneven distribution of emotional information in different modalities and obtain a deeper multimodal emotional representation, this paper proposes a method called that bi-bi-modality with bi-gated fusion in multimodal sentiment analysis (BBBGF). In the process of fusing text-vision modality, text-audio modalities, the dominant position of the text modality among the three modalities is fully considered. At the same time, the dual fusion is used to obtain the multimodal emotional interaction information at the deeper level. In the first fusion, a fusion gate is used to decide how much knowledge of the supplement modality is added to the main modality, and getting two bi-modality hybrid knowledge matrices. In the second fusion, considering the redundant and repeated information in the two bi-modality mixed knowledge matrices, a selection gate is used to select effective and non-repeating emotional information as the final knowledge. On the public dataset CMU-MOSEI, the accuracy and F1 value of the sentiment binary classification reaches 86.2% and 86.1%, respectively, showing good robustness and advancement.
    Reference | Related Articles | Metrics
    Multiview Interaction Learning Network for Multimodal Aspect-Level Sentiment Analysis
    WANG Xuyang, PANG Wenqian, ZHAO Lijie
    Computer Engineering and Applications    2024, 60 (7): 92-100.   DOI: 10.3778/j.issn.1002-8331.2210-0288
    Abstract84)      PDF(pc) (591KB)(144)       Save
    Previous multimodal aspect-level sentiment analysis methods only use the general text and picture representations of the pre-trained model, which are insensitive to recognition of aspect and opinion word correlation, and the contribution of picture information to word representation cannot be obtained dynamically, so they cannot fully recognize the correlation between multimodal and aspects. Aiming at the above problems, a multiview interaction learning network is proposed. In order to make full use of the global features of the text in multimodal interaction, extracting sentence features from context and syntax views respectively sentences are extracted. Model the relationship among text, picture and aspect to realize multimodal interaction. At the same time, the interactive representation of different modalities is fused to dynamically obtain the contribution of visual information to each word in the text, and the correlation between modalities and aspects is fully extracted. Finally, the sentiment classification results are obtained through the fully connected layer and Softmax layer. Experiments on two datasets show that this model can effectively enhance the effect of multimodal aspect-level sentiment classification.
    Reference | Related Articles | Metrics
    Recommendation for Reducing Unrelated Neighborhoods by Combining Project Attribute Collaboration Signals
    ZHAO Wentao, XUE Saili, LIU Tiantian
    Computer Engineering and Applications    2024, 60 (7): 101-107.   DOI: 10.3778/j.issn.1002-8331.2211-0042
    Abstract38)      PDF(pc) (568KB)(38)       Save
    In the recommendation system, knowledge graph (KG) is used as auxiliary information to improve the performance and interpretability of the algorithm. However, when aggregating multi-hop neighbors, it usually aggregates and propagates all the entity information. Not all information in KG helps to improve recommendation results, and when aggregate neighborhood information is not differentiated, the embedding of entities will be interfered with by unrelated entities. Aiming at the above problems, this paper proposes a model of project attribute cooperative signals and screening highly relevant neighborhood policies (RUNCS) to improve the effect of recommendation. Specifically, firstly, the users who have clicked on the same item are called similar neighbors, and then the cooperative set of item attributes is obtained by combining the items clicked by similar neighbors with the item attributes in KG. Secondly, the similarity of item attributes is calculated to obtain the correlation score, which is used to screen the highly correlated neighbors. Finally, the attention mechanism is used to aggregate the information of its weight allocation. Experimental results on two benchmark datasets, music and film, show that compared with the existing optimal mainstream methods, the AUC of CTR forecast by this model increases by 0.6~2.7 percentage points.
    Reference | Related Articles | Metrics
    Research on Gesture Recognition Based on Improved YOLOv5 and Mediapipe
    NI Guangxing, XU Hua, WANG Chao
    Computer Engineering and Applications    2024, 60 (7): 108-118.   DOI: 10.3778/j.issn.1002-8331.2308-0097
    Abstract120)      PDF(pc) (686KB)(133)       Save
    The existing gesture recognition algorithms have the problems of large amounts of calculation and poor robustness. In this paper, a gesture recognition method based on IYOLOv5-Med (improved YOLOv5 Mediapipe) algorithm is proposed. This algorithm combines the improved YOLOv5 algorithm with the Mediapipe method, including gesture detection and gesture analysis. In the part of gesture detection, the traditional YOLOv5 algorithm is improved. Firstly, the C3 module is reconstructed by FastNet. Secondly, the CBS module is replaced by the GhostConv module in GhostNet. Thirdly, the SE attention mechanism module is introduced at the end of the Backbone network. The improved algorithm has a smaller model size and is more suitable for edge devices with limited resources. In the part of gesture analysis, a method based on Mediapipe is proposed. The key points of the hand are detected in the gesture area located in the gesture detection part, and the relevant features are extracted, and then identified by the naive Bayes classifier. The experimental findings affirm the efficacy of the IYOLOv5-Med algorithm introduced in this article. When compared to the conventional YOLOv5 algorithm, the parameters are reduced by 34.5%, the computations are reduced by 34.9%, and the model weight is decreased by 33.2%. The final average recognition rate reaches 0.997, and the implementation method is relatively simple, which has a good application prospect.
    Reference | Related Articles | Metrics
    Temporal Event Prediction Based on Implicit Relationship of Multiple Sequences
    HAO Zhifeng, LIU Jun, WEN Wen, CAI Ruichu
    Computer Engineering and Applications    2024, 60 (7): 119-127.   DOI: 10.3778/j.issn.1002-8331.2211-0137
    Abstract46)      PDF(pc) (533KB)(52)       Save
    Temporal event prediction refers to the prediction of the next event based on historical events. The event includes time and type attributes. Current work focuses on one-sided (event time or event type) prediction, but this cannot answer more detailed questions such as “when did something happen”. The challenges are as follows, the event type is very diverse and the behavior is often highly sparse, which makes prediction very difficult; secondly, the event time and event type belong to two domains. It is also a challenge to combine the information of these two domains. In response to the above challenges, one approach is explored from the perspective of fusing multiple sequences of hidden information. Firstly, based on the observation that certain event sequences have pattern similarity with each other, it proposes to model the hidden relationship graph of event sequences, and use the information of neighboring sequences to solve the problem of behavioral sparsity; secondly, by reasonably designing the neural network module, it maps the information of the time domain and type domain of events to a common abstract space, and solves the fusion modeling problem of event time and event type. By conducting a large number of experiments on several real datasets, the experimental results corroborate that the multiple sequence deep temporal model is better than a series of existing benchmark models.
    Reference | Related Articles | Metrics
    Demand Aware Attention Graph Neural Network for Session-Base Recommendation
    ZHENG Xiaoli, WANG Wei, DU Yuxuan, ZHANG Chuang
    Computer Engineering and Applications    2024, 60 (7): 128-140.   DOI: 10.3778/j.issn.1002-8331.2211-0248
    Abstract35)      PDF(pc) (1008KB)(52)       Save
    Aiming at the existing graph-based session recommendation method, which ignores the noise effect caused by the uncertainty of user behavior in the feedback data, and there is a problem that it cannot accurately and effectively capture user preferences, a demand aware attention graph neural network for session-based recommendation (DAAGNNSR) model is proposed. Firstly, the session data with time series is constructed as a graph, and the node embedding representation on the graph is learned by introducing the graph neural network. Secondly, the extracted project features are linearly aggregated into a user potential demand matrix using a demand aware aggregator to automatically attenuate noise interference, and at the same time, the low-rank multi-head attention network is used to interact with all item features item by item to generate a demand enhancing project representation. Again the joint independent position coding further analyzes the sequential association between the items, and the resulting independent position embedding is linearly fused with the project representation. Finally, a ranking recommendation list is generated by the prediction layer. The proposed model is trained and tested on three common datasets of Diginetica, Tmall and Nowplaying, and the experimental results show that the recommended accuracy of the model is better than other baseline models in all indexes, and compared with the graph context self-attention network for session-based recommendation (GCSAN), the NDCG@10 on Diginetica is improved by 5.6% and the Recall@10 on Tmall is increased by 6.4%. Compared with the SRGNN based on graph neural networks, the Precision@10 on Tmall is improved by 5.0%, and the recommended performance is significantly improved.
    Reference | Related Articles | Metrics
    Applying Attention Transformer Module to 3D Lip Sequence Identification
    PIAN Xinyang, WANG Yu, ZHANG Jie
    Computer Engineering and Applications    2024, 60 (7): 141-146.   DOI: 10.3778/j.issn.1002-8331.2211-0295
    Abstract41)      PDF(pc) (598KB)(65)       Save
    Lip behavior is a newly emerging biometric recognition technology, and 3D lip point cloud sequences have become an important biometric feature for individual identification because they contain real lip spatial structure and motion information. The disorder and unstructured characteristics of the 3D point cloud, however, make the extraction of spatio-temporal features very difficult. To this end, a deep learning network model based on point feature Transformer is proposed for 3D lip sequences identification. This network uses an improved four-layer PointNet++ as the backbone of the network to extract features in a layered manner. And, an attention Transformer module with dynamic lip features is designed and added behind each layer of the PointNet++ network in order to learn more spatio-temporal features containing identity information, which is beneficial to learn the relevant information among different feature maps and to capture effectively contextual information among different video sequence frames. Compared with the Transformers constructed by other attention mechanisms, the Transformer module proposed in this paper has fewer parameters, and experimental results on the S3DFM-FP and S3DFM-VP datasets show that the proposed network model is effective for the identification task of 3D lip point cloud sequences. Even on the S3DFM-VP dataset, which is not constrained by pose, the proposed network model shows better performance.
    Reference | Related Articles | Metrics
    Conformer-Based Speaker Recognition Model for Real-Time Multi-Scenarios
    XUAN Xi, HAN Runping, GAO Jingxin
    Computer Engineering and Applications    2024, 60 (7): 147-156.   DOI: 10.3778/j.issn.1002-8331.2210-0145
    Abstract36)      PDF(pc) (792KB)(39)       Save
    To handle the problems of poor performances of speaker verification systems, appearing in multiple scenarios with cross-domain utterances, long-duration utterances and noisy utterances, a real-time robust speaker recognition model, PMS-Conformer, is designed based on Conformer in this paper. The architecture of the PMS-Conformer is inspired by the state-of-the-art model named MFA-Conformer. PMS-Conformer has made the improvements on the acoustic feature extractor, network components and loss calculation module of MFA-Conformer respectively, having the novel and effective acoustic feature extractor and the robust speaker embedding extractor with high generalization?capability. PMS-Conformer is trained on VoxCeleb1&2 dataset, and it is compared with the baseline MFA-Conformer and ECAPA-TDNN, and extensive comparison experiments are conducted on the speaker verification tasks. The experimental results show that on VoxMovies with cross-domain utterances, SITW with long-duration utterances and VoxCeleb-O processed by adding noise to its utterances, the ASV system built with PMS-Conformer is more competitive than those built with MFA-Conformer and ECAPA-TDNN respectively. Moreover, the trainable Params and RTF of the speaker embedding extractor of PMS-Conformer are significantly lower than those of ECAPA-TDNN. All evaluation experiment results demonstrate that PMS-Conformer exhibits good performances in real-time multi-scenarios.
    Reference | Related Articles | Metrics
    Image-Guided Augmentation Visual Question Answering Model Combined with Contrastive Learning
    YANG You, YAO Lu
    Computer Engineering and Applications    2024, 60 (7): 157-166.   DOI: 10.3778/j.issn.1002-8331.2211-0447
    Abstract44)      PDF(pc) (911KB)(50)       Save
    Aiming at two problems of existing attention-based encoder-decoder visual question answering (VQA) models, image-guided augmentation VQA model combined with contrastive learning (IGA-CL) is proposed. One of these two problems is that single-type image feature contains incomplete visual information, another is that existing models rely overly on question guidance. To solve the first problem, the dual-feature visual decoder (DFVD) is proposed. It is based on the Transformer language encoder. After the single image feature is extended into two types:region and grid, visual information is refined through constructing complementary spatial relations based on the relative positions of different type features. To solve the second problem, the vision-guided language decoder (VGLD) is proposed. It twice matches the two decoded image features with the question features. In which, the parallel gated guided-attention (PGGA) is designed to correct adaptively the guiding proportions of different image features to the question. To obtain more similar mutual information, the contrastive learning loss function is introduced during the training process. It can compare the similarity of different modal features in the hidden space during model reasoning. The proposed model achieves 73.82%, 72.49% and 57.44% overall accuracy on the VQA 2.0, COCO-QA and GQA, respectively, which is 2.92 percentage points, 4.41 percentage points and 0.8 percentage points better than MCAN model. Extensive ablation experiments and visualization analysis demonstrate the effectiveness of the proposed model. Experimental results show that the proposed model can obtain more relevant language-vision information and has stronger generalization ability for different types of question samples.
    Reference | Related Articles | Metrics
    Medical Report Extraction Generation Model Integrated with BioCopy Mechanism
    LIU Lan, TAN Hongye
    Computer Engineering and Applications    2024, 60 (6): 155-162.   DOI: 10.3778/j.issn.1002-8331.2210-0071
    Abstract36)      PDF(pc) (600KB)(41)       Save
    Wise information technology of med (WITMED) is a new health care service mode that integrates information technologies such as artificial intelligence. Among them, automatic generation of medical reports is an important task in the field of WITMED. This task generates semi-structured medical reports based on patient self-report and doctor-patient dialogue. The medical report not only contains the chief complaint and other sub parts, but also contains a large number of medical terms from the original text. In view of these characteristics, a summary model integrating extraction and abstraction of BioCopy mechanism is adopted. Firstly, the model extracts key sentences for each sub-part to eliminate the interference of irrelevant information. Then, the BioCopy mechanism is added when generating the medical report to copy the medical terms in the key sentences to ensure the accuracy of the results. The experimental results on CCL 2021 datasets show that this model is superior to the main baseline and has achieved good results.
    Reference | Related Articles | Metrics
    Deep Neural Network Channel Pruning Compression Method for Filter Elasticity
    LI Ruiquan, ZHU Lu, LIU Yuanyuan
    Computer Engineering and Applications    2024, 60 (6): 163-171.   DOI: 10.3778/j.issn.1002-8331.2210-0420
    Abstract24)      PDF(pc) (713KB)(27)       Save
    Deep neural network (DNN) has achieved great success in various fields. Due to its high computing and storage costs, it is difficult to directly deploy them to resource constrained mobile devices. To solve this problem, the importance evaluation of the global filter in the network is studied, and a channel pruning compression method with filter elasticity is proposed to reduce the size of the neural network. Firstly, the method sets the local dynamic threshold between layers to improve the shortage of over pruning in L1 regularization (L1 lasso) sparse training. Then, its output is multiplied by the channel scaling factor to replace the ordinary convolution layer module. The importance of the global filter is defined by the elastic size of the filter. Its values are estimated and ranked by Taylor formula. At the same time, a new iterative pruning framework of the filter is designed to balance the contradiction between the pruning performance and the pruning speed. Finally, the improved L1 regularization training and the importance of the global filter are used to prune the composite channels. VGG-16 is tested on CIFAR-10 using the proposed method, which reduces 80.2% of floating-point operations (FLOPs) and 97.0% of parameter quantities, without significant loss of accuracy, indicating the effectiveness of the method, which can compress neural networks in a large scale, and can be deployed to resource constrained terminal devices.
    Reference | Related Articles | Metrics
    Research on Angle-Optimised Grasp Detection Algorithm Based on YOLOv5
    CHEN Chunchao, SUN Donghong
    Computer Engineering and Applications    2024, 60 (6): 172-179.   DOI: 10.3778/j.issn.1002-8331.2210-0499
    Abstract45)      PDF(pc) (649KB)(53)       Save
    Aiming at the problems that the current robot grasping detection method is too discrete in predicting the grasping angle and the grasping process may produce large off-angle, which reduces the grasping detection accuracy and even leads to grasping failure, an improved robot real-time grasping detection method based on the YOLOv5 neural network model is proposed. Firstly, the grasping frame coordinates and grasping angles are extracted based on the single-stage object detection model YOLOv5. Afterwards, the grasping angles are divided more carefully, while circular smoothing label is introduced to accommodate the periodicity of the angles, links between adjacent angles are established, the YOLOv5 detection head is decoupled, and the loss function is optimized to improve the detection accuracy. Finally, an experimental validation is performed on the Cornell dataset. The experimental results show that the proposed algorithm can better predict the grasping angle and improve the grasping detection accuracy compared with the classical grasping detection methods. The model achieves 97.5% accuracy and 71?FPS detection speed on the Cornell dataset.
    Reference | Related Articles | Metrics
    Dynamic Dominant Fusion Multimodal Sentiment Analysis Method Based on Autoencoder
    YANG Xi, GUO Junjun, YAN Haining, TAN Kaiwen, XIANG Yan, YU Zhengtao
    Computer Engineering and Applications    2024, 60 (6): 180-187.   DOI: 10.3778/j.issn.1002-8331.2211-0010
    Abstract45)      PDF(pc) (562KB)(56)       Save
    In multimodal sentiment analysis, the modality that plays a dominant role in sentiment determination is dynamic. Usually, traditional multimodal sentiment analysis methods regard text modal as a dominant modal, but ignore the change in dominant modal at different moments due to the differences between modalities. Aiming at selecting dominant modal dynamically in each moment, this paper proposes a dynamic dominant fusion multimodal sentiment analysis method based on autoencoder. The method firstly encodes single modalities and obtains multimodal fusion features. And an autoencoder is applied to map them into a shared space. In the space, the dominant modality is selected by correlation between unimodal and fusion modal. Finally, the dominant multimodal information is used to guide multimodal fusion to obtain the multimodal robustness representation. The extensive experiments on the multimodal sentiment analysis benchmark dataset CMU-MOSI demonstrate the effectiveness of the proposed method, which outperform most of the existing state-of-the-art multimodal sentiment analysis methods.
    Reference | Related Articles | Metrics
    Medical Named Entity Recognition Based on Multi-Feature and Co-Attention
    LIU Xinning
    Computer Engineering and Applications    2024, 60 (6): 188-198.   DOI: 10.3778/j.issn.1002-8331.2211-0094
    Abstract40)      PDF(pc) (707KB)(48)       Save
    Aiming at the situation that the accuracy of entity recognition cannot be effectively improved due to the lack of fusion of unique feature information of medical texts in current Chinese medical named entity recognition, and the problem that single attention mechanism affects the effectiveness of entity classification, a Chinese medical named entity recognition method based on multi-feature fusion and co-attention mechanism is proposed. Firstly, the vector representation of the original medical text is obtained by using the pre-trained model, and then the feature vectors of word granularity are obtained by using the bidirectional gated recurrent neural network (BiGRU). Secondly, combined with the distinctive radical features of medical named entities, iterative dilation convolution neural network (IDCNN) is used to extract radical-level feature vectors. Finally, the co-attention network is used to integrate medical vector features to generate double correlation features of <Characters-Radicals> pair, and then conditional random field (CRF) is used to output entity recognition results. The experimental results show that, compared with other entity recognition models, it can achieve higher accuracy, recall and F1 value on the CCKS dataset. At the same time, although the complexity of the recognition model is increased, the performance does not decrease significantly.
    Reference | Related Articles | Metrics
    Chinese Named Entity Recognition Methods Combined with Entity Boundary Cues
    HUANG Rong, CHEN Yanping, HU Ying, HUANG Ruizhang, QIN Yongbin
    Computer Engineering and Applications    2024, 60 (6): 199-206.   DOI: 10.3778/j.issn.1002-8331.2211-0119
    Abstract51)      PDF(pc) (612KB)(54)       Save
    As a basic task in information extraction, named entity recognition (NER)  can provide effective support for machine translation, relation extraction and other downstream tasks, and is of great research significance. To tackle the problem of fuzzy entity boundary in Chinese named entity recognition methods, a named entity recognition model combining entity boundary cue is proposed. The model is composed of three modules:boundary detection, cue generation and entity classification. Firstly, the entity boundary detection module is used to identify the entity boundary. Then, the entity span is generated according to the boundary information in the cue generation module, and the text sequence with the boundary cue label is obtained. Through the boundary cue label, the model can perceive the entity boundary in the sentence, and learn the semantic dependence characteristics of the entity boundary and context. Finally, the text sequence with boundary cue tags is employed as the input of entity classification module, and the semantic interaction between tags is enhanced by the Biaffine mechanism, then combined with the joint prediction of multilayer perceptron and Biaffine mechanism as the result of entity recognition. The F1 values of this model on ACE2005 Chinese dataset and Weibo dataset reaches 90.47% and 73.54% respectively, which verifies the effectiveness of the model for Chinese named entity recognition.
    Reference | Related Articles | Metrics
    Visual Question Answering Research on Joint Knowledge and Visual Information Reasoning
    SU Zhenqiang, GOU Gang
    Computer Engineering and Applications    2024, 60 (5): 95-102.   DOI: 10.3778/j.issn.1002-8331.2209-0456
    Abstract68)      PDF(pc) (740KB)(78)       Save
    As a task in the multimodal field, visual question answering requires fusion and reasoning of the features of different modalities, which has important application value. In traditional visual question answering, the answer to the question can be well reasoned only by relying on the visual information of the image. However, pure visual information cannot meet the diverse question-answering needs in real-world scenarios. Knowledge plays an important role in visual question answering and can well assist question answering. Knowledge-based open visual question answering needs to correlate external knowledge to achieve cross-modal scene understanding. In order to better integrate visual information and related external knowledge, a bilinear structure for joint knowledge and visual information reasoning is proposed, and a dual-guided attention module for knowledge representation by image features and question features is designed. Firstly, the model uses the pre-trained vision-language model to obtain the feature representation and visual reasoning information of the question and image, Secondly, the similarity matrix is used to calculate the image object area under the semantic alignment of the question, and then the regional features after the joint alignment of the question features jointly guide the knowledge representation to obtain knowledge reasoning information. Finally, the visual reasoning information and the knowledge reasoning information are fused to get the final answer. The experimental results on the OK-VQA dataset show that the accuracy of the model is 1.97 percentage points and 4.82 percentage points higher than the two baseline methods, respectively, which verifies the effectiveness of the model.
    Reference | Related Articles | Metrics
    Cross-Domain Face in Vivo Detection of Unilateral Adversarial Network Algorithm
    ZENG Fanzhi, WU Chutao, ZHOU Yan
    Computer Engineering and Applications    2024, 60 (5): 103-111.   DOI: 10.3778/j.issn.1002-8331.2210-0134
    Abstract56)      PDF(pc) (589KB)(65)       Save
    In the existing cross-domain face detection algorithms, the feature extraction process is prone to overfitting and lack of feature aggregation, resulting in insufficient generalization. To solve this problem, this paper proposes a unilateral adversarial network algorithm for cross-domain face in vivo detection. Firstly, grouping convolution and improved reciprocal residual structure are fused to replace ordinary convolution to reduce network parameters and enhance the expression ability of face fine-grained features, and an adaptive feature normalization module is introduced, emphasizing the face in vivo information region fade irrelevant background region in the image. Effectively it avoids the overfitting merging of live face information and enhances the ability of face detection from different source domains. Secondly, based on NetVLAD, the channel attention mechanism module is introduced. As a branch of feature aggregation network, the channel attention mechanism module learns the semantic information of local features in different source domains, effectively enhancing the generalization ability of face live information classification in different source domains. Finally, a two-module fusion network is designed to improve the accuracy of cross-domain face detection in unknown scenes. Experimental results on OULU-NPU, CASIA-FASD, MSU-MFSD, and Idiap Replay-Attack data sets show that, the proposes algorithm has good performance in cross-data set tests of O&C&M to I, O&C&I to M, I&C&M to O, and O&M&I to C. Among them, the performance evaluation indexes of O&C&I to M and O&M&I to C have improved the accuracy by 0.99 percentage points and 0.5 percentage points respectively.
    Reference | Related Articles | Metrics
    Multi-View Representation Model for Aspect-Level Sentiment Analysis
    XU Xuefeng, HAN Hu
    Computer Engineering and Applications    2024, 60 (5): 112-121.   DOI: 10.3778/j.issn.1002-8331.2210-0231
    Abstract60)      PDF(pc) (637KB)(64)       Save
    The fine-grained sentiment analysis of user comments for specific aspects is a popular research topic in the field of natural language processing. For the flexibility of comment statements in content expression and syntactic structure, the integrated use of lexical, syntactic and semantic knowledge to enhance the feature representation of comment statements is a major research idea at present. Based on this, a graph convolutional network model for multi-view fusion representation is proposed in this paper. First, the model learns to obtain context-based enhanced representations of comment statements through self-attention and aspect-specific attention. Second, two different representations of comment utterances based on syntax and semantics are obtained through graph convolution operations using syntactic dependency information and word co-occurrence information, respectively. Finally, a hierarchical fusion approach is designed based on obtaining three different view representations to achieve information sharing and complementarity among different view representations by combining and convolving the three representations. Experimental results on five publicly available datasets show that the model achieves better performance than existing models.
    Reference | Related Articles | Metrics
    Reverse Inference Model for Document-Level Event Extraction
    JI Wanting, MA Yuhang, LU Wenyi, WANG Junlu, SONG Baoyan
    Computer Engineering and Applications    2024, 60 (5): 122-129.   DOI: 10.3778/j.issn.1002-8331.2210-0237
    Abstract45)      PDF(pc) (634KB)(68)       Save
    Event extraction aims to detect event types and extract event arguments from unstructured texts. Existing methods still have limitations when dealing with document-level texts. This is because a document-level text may consist of multiple events, and the event arguments that constitute an event are usually scattered across different sentences. To address the above challenges, this paper proposes a reverse inference model for document-level event extraction (RIDEE). Based on the design without trigger words, RIDEE simplifies the document-level event extraction into two sub-tasks, candidate event argument extraction and event triggering inference, to extract event arguments in parallel and detect event types. In addition, this paper designs an event dependency pool for storing historical events, so that the model can make full use of the dependencies between events when processing the multi-event texts. Experimental results on the public dataset show that RIDEE has better performance in document-level event extraction than the existing event extraction models.
    Reference | Related Articles | Metrics
    Bidirectional Interaction Model for Joint Multiple Intent Detection and Slot Filling
    LI Shi, SUN Zhenpeng
    Computer Engineering and Applications    2024, 60 (5): 130-138.   DOI: 10.3778/j.issn.1002-8331.2210-0271
    Abstract57)      PDF(pc) (578KB)(44)       Save
    Intent detection and slot filling are the two major tasks of spoken language understanding, which are highly correlated and are usually trained jointly. As the spoken language understanding task progresses, it has been found that users’ utterances in real-life scenarios often contain multiple intents. However, some joint models can only detect a single intent in user utterances and fail to adequately model the correlation between multiple intents and slots. Since the information of multiple intents in the utterance can guide the slot filling and the slot information can also help the better detection of intents. The Label Bi-Interaction model uses the graph attention network to establish a two-way interaction between intents and slots. Specifically, Label Bi-Interaction model associates two tasks bidirectionally so that the model can explore the relationship between multiple intents and slots, and introduces the label information of the two tasks to enable the model to learn the relationship between utterance context and labels. This improves the accuracy of intent detection and slot filling and optimizes the overall performance of spoken language understanding. Experiments show that the performance of the model on the MixATIS and MixSNIPS two multi-intent datasets has been significantly improved compared to other models.
    Reference | Related Articles | Metrics
    Personalized Dynamically Ensemble for Alzheimer’s Disease Auxiliary Diagnostics Model
    LIANG Haolin, PAN Dan, ZENG An, YANG Baoyao, Xiaowei Song
    Computer Engineering and Applications    2024, 60 (5): 139-145.   DOI: 10.3778/j.issn.1002-8331.2211-0150
    Abstract34)      PDF(pc) (728KB)(39)       Save
    Aiming at the problem that most of the Alzheimer’s disease (AD) classification models do not develop specific strategies for input samples, resulting in the easy neglect of personalized differential information between samples, a novel AD classification model, namely personalized dynamically ensemble convolution neural network (PDECNN), is proposed. Considering the difference in degeneration degree of brain regions between input samples, PDECNN involves an attention-net to evaluate the degeneration degree of each brain region specific to the input sample. Based on the estimated results of the attention-net, a dynamic ensemble strategy is newly designed to select and fuse brain region features for AD identification. In addition, by redesigning the loss function, the problem that the optimal gradient of unselected brain regions cannot be obtained is solved, thus improving the AD classification performance. The experimental results show that compared with AD classification models, the classification accuracy of PDECNN in the AD vs. HC (healthy cognition), MCIc (mild cognitive impairment who will convert to AD) vs. HC, and MCIc vs. MCInc (mild cognitive impairment who will not convert to AD) experiments can be increased by 4%, 11%, and 8%, respectively. The experimental results also find that the degenerate brain regions identified by the PDECNN correlate with AD’s clinical manifestations.
    Reference | Related Articles | Metrics
    Prompt-Learning Inspired Approach to Unsupervised Sentiment Style Transfer
    CAI Guoyong, LI Anqing
    Computer Engineering and Applications    2024, 60 (5): 146-155.   DOI: 10.3778/j.issn.1002-8331.2211-0317
    Abstract34)      PDF(pc) (929KB)(45)       Save
    Text style transfer is the task of transferring text generation with certain desired style properties while preserving the original text content. In order to improve the transfer quality under non-parallel style corpus, this paper proposes a new method to guide the fill-mask model to rewrite the sentence into the target style. Overall, this approach is based on the delete-retrieve-generate style transfer framework, but employs a large unsupervised pre-trained language model and Transformer architecture. According to the working principle of Transformer, firstly, the method of filtering style attributes from the source sentence is improved, and then the internal knowledge of the pre-trained model is mined by the prompt learning method to generate the target style words. Experiments on two sentiment benchmark datasets show that the method outperforms existing editing methods, with an average improvement of more than 14% in relative scores on the comprehensive metrics.
    Reference | Related Articles | Metrics
    Data Reconstruction Based on Quantum Generative Adversarial Networks
    JIANG Yida, WANG Mingming
    Computer Engineering and Applications    2024, 60 (5): 156-164.   DOI: 10.3778/j.issn.1002-8331.2211-0363
    Abstract30)      PDF(pc) (1188KB)(31)       Save
    Data reconstruction using neural networks is a very important research topic in the field of artificial intelligence. Generative adversarial network (GAN), as a popular algorithm of artificial intelligence in recent years, has a good performance in completing data reconstruction tasks. As a new computing mode that can accelerate classical computing, quantum computing is constantly merging with classical artificial intelligence algorithms. Among them, pure quantum generative adversarial network (QGAN) has a good performance in image related tasks. However, since the fitting ability in the quantum model still needs to be improved, this paper proposes a hybrid generative confrontation network (Q-CGAN) based on the GAN framework to realize the data reconstruction task. The framework exploits classical nonlinearities to improve fitting performance and quantum properties to provide quantum speedups. Using the MNIST handwritten data set to compare and verify the reconstruction effect of the hybrid model in this network, the results show that Q-CGAN has better performance in the data reconstruction process than pure quantum generators. In addition, the effect of using different quantum encoding schemes and different parameterized quantum circuits in the hybrid model on the data reconstruction effect is also studied.
    Reference | Related Articles | Metrics
    Oversampling Method for Imbalanced Data Using Credible Counterfactual
    GAO Feng, SONG Mei, ZHU Yi
    Computer Engineering and Applications    2024, 60 (5): 165-171.   DOI: 10.3778/j.issn.1002-8331.2211-0413
    Abstract50)      PDF(pc) (2722KB)(42)       Save
    A new method for imbalanced data sets on counterfactual is proposed (counterfactual,CF), and further removes the “incredibility” composite samples, which aims to solve the problem of the traditional sampling method that cannot make full use of the data set information. Its core idea is to synthesize new samples based on the original instance features of the dataset. Compared with the traditional oversampling interpolation method, it can fully mine the boundary decision information in the data, so as to provide more useful information for the classifier and improve the classification performance. A lot of comparative experiments have been carried out on 9 KEEL and UCI unbalanced datasets, 5 different classifiers (SVM, DT, Logistic, RF, AdaBoost) and 4 traditional oversampling methods (SMOTE, B1-SMOTE, B2-SMOTE, ADASYN). The results show that the algorithm has higher AUC value、F1 value and G-mean value, which can effectively solve the class imbalance problem.
    Reference | Related Articles | Metrics