Content of Pattern Recognition and Artificial Intelligence in our journal

        Published in last 1 year |  In last 2 years |  In last 3 years |  All
    Please wait a minute...
    For Selected: Toggle Thumbnails
    Method for Generating Summary of Judgment Documents Based on Trial Logic Steps
    YU Shuai, SONG Yumei, QIN Yongbin, HUANG Ruizhang, CHEN Yanping
    Computer Engineering and Applications    2024, 60 (4): 113-121.   DOI: 10.3778/j.issn.1002-8331.2209-0142
    Abstract62)      PDF(pc) (2408KB)(39)       Save
    Judicial summary oriented to judgment documents is the key technology to improve the analytical ability of judgment documents. As the carrier of the trial activities, the judgment documents accurately present the trial logic of the case. However, the current abstract methods only focus on the serialization information of the judgment documents, ignore the logical structure, and can not effectively solve the problems of too long texts and redundant information. A judgment document summary generation method based on the trial logic steps is proposed. The method of “extraction + generation” is adopted. The extraction part uses the multi-label classification method to extract four sentence sets of “type, claim, fact and result” according to the logic steps of the people's court. The generation part gets the summary from the fine-tuned T5-PEGASUS model. And the input text of the “fact” part is denoised by using the maximum similarity matching algorithm based on internal knowledge, which further improves the summary effect. The experimental results show that, compared with the mainstream pointer-generated network summary model, the proposed method improves the F1 index of ROUGE-1, ROUGE-2 and ROUGE-L by 17.99 percentage points, 21.24 percentage points and 21.86 percentage points, respectively. This shows that introducing logical structure into the judicial summarization can improve the performance of the task.
    Reference | Related Articles | Metrics
    Speech Emotion Recognition for Imbalanced Datasets
    ZHANG Huiyun, HUANG Heming
    Computer Engineering and Applications    2024, 60 (4): 122-132.   DOI: 10.3778/j.issn.1002-8331.2209-0099
    Abstract53)      PDF(pc) (3523KB)(30)       Save
    The sample balance is crucial for machine learning. The importance of certain classes may be higher than its number on the imbalanced datasets. This paper studies the imbalanced datasets for speech emotion recognition. Firstly, the imbalanced baseline datasets EMODB and IEMOCAP are augmented with different signal-to-noise?ratios, and the datasets EMODBM and IEMOCAPM are constructed. Secondly, six techniques namely SMOTE, RandomOverSampler, SMOTEENN, ADASYN, TomekLinks and SMOTETomek are adopted to resample the baseline datasets, and the augmented datasets are constructed to achieve the category balance. Thirdly, 21-dimensional low-level descriptor features are extracted from the baseline datasets and the augmented datasets. Finally, a novel model MA-CapsNet is proposed to validate the effectiveness of the resampling techniques. The results show that all types of emotion samples are basically balanced after resampling, which makes the learning of the model MA-CapsNet fairer. In addition, the model MA-CapsNet has better robustness on the resampling datasets.
    Reference | Related Articles | Metrics
    Cross-Modality Person Re-identification Combined with Data Augmentation and Feature Fusion
    SONG Yu, WANG Banghai, CAO Ganggang
    Computer Engineering and Applications    2024, 60 (4): 133-141.   DOI: 10.3778/j.issn.1002-8331.2209-0120
    Abstract64)      PDF(pc) (2285KB)(61)       Save
    The difficulty of visible-infrared person re-identification problem lies in the large modal difference between images. Most existing methods alleviate the modal difference by generating fake images through generative adversarial networks or extracting modal shared features on the original image. However, training a generative adversarial network consumes a lot of computational resources and generates fake images that are prone to introduce noise, and extracting modal shared features can also result in the loss of important differentiated features. To address these problems, a new cross-modality person re-identification network is proposed. Firstly, automatic data augmentation is used to improve model robustness. Then, instance regularization is used in the network to reduce modal differences. Finally, the pedestrian features of different scales extracted by each layer of the network are organically fused, and the fused features contain more differentiated features related to pedestrian identity. The proposed method achieves Rank-1/mAP of 69.47%/65.05% in the all-search mode of the SYSU-MM01, and Rank-1/mAP of 85.73%/77.77% in the visible to infrared modes of the RegDB, respectively. The experimental results have a significant improvement effect.
    Reference | Related Articles | Metrics
    Efficient Cross-Domain Transformer Few-Shot Semantic Segmentation Network
    FANG Hong, LI Desheng, JIANG Guangjie
    Computer Engineering and Applications    2024, 60 (4): 142-152.   DOI: 10.3778/j.issn.1002-8331.2209-0156
    Abstract54)      PDF(pc) (2740KB)(45)       Save
    Few-shot semantic segmentation aims at only using several labeling samples to learn target features and complete the semantic segmentation task. The main problems in mainstream research are low training efficiency, meta training and meta testing in the same data domain. For this task, this paper proposes an efficient, cross-domain few-shot semantic segmentation network based on Transformer: SGFNet. In the encoding layer, use the shared weight MixVisionTransformer to build a siamese network to extract the support set and query set image features. In the relationship calculation layer, calculate the Hadamard product of the support set image feature vector and its corresponding mask to extract the target feature maps, and calculate the relationship between them and the image features of the query set. In the decoder layer, improve the MLP decoder and propose a residual decoder to decode the features of different hierarchies to obtain the final segmentation result. Experiments show that the model only needs to use a single 3090 GPU on the FSS-1000 dataset for training 1.5~4.0 h to get the optimal result 1-shot mIoU 87.0% on PASCAL-5i and the COCO-20i dataset perform cross-domain tests to achieve non-cross-domain effects, the 1-shot mIoU is 60.4% and 33.0%, respectively, proving that the model is efficient and cross-domain.
    Reference | Related Articles | Metrics
    Joint Dual-Dimensional User Scheduling for Adaptive Federated Edge Learning
    ZHANG Jiuchuan, PAN Chunyu, ZHOU Tianyi, LI Xuehua, DING Yong
    Computer Engineering and Applications    2024, 60 (4): 153-162.   DOI: 10.3778/j.issn.1002-8331.2209-0459
    Abstract27)      PDF(pc) (2402KB)(16)       Save
    Federated edge learning does not need to transmit local data, which greatly reduces the pressure on the uplink while protecting user privacy. The federated edge learning uses the local dataset to train the local model through the intelligent edge device and then uploads the model parameters to the central server; the central server aggregates the local model parameters uploaded locally to form a global model and updates it, and then sends the updated model to the intelligent edge device to start a new iteration. However, the local model accuracy and local model training time will have a significant impact on the global model aggregation and model update process. Therefore, an adaptive dynamic batch gradient descent strategy is firstly proposed, which can automatically adjust the batch size extracted by gradient descent during the local model training process, and optimize the local model accuracy and convergence speed of federated learning. Next, aiming at the non-IID characteristics of user data, an adaptive dynamic batch gradient descent algorithm that combines two-dimensional user scheduling strategies is designed, and two-dimensional constraints are imposed by convergence time and data diversity. After training and testing on the MNIST dataset, fashion MNIST dataset and CIFAR-10 dataset, the algorithm effectively reduces the aggregation waiting time and further improves the global model accuracy and convergence speed. Compared with the gradient descent algorithm with fixed batches of 64, 128, and 256, the global model accuracy of this algorithm is increased by 32.4%, 45.2%, and 87.5% when running for 100 seconds.
    Reference | Related Articles | Metrics
    Extreme Multi-Label Text Classification Based on Balance Function
    CHEN Zhaohong, HONG Zhiyong, YU Wenhua, ZHANG Xin
    Computer Engineering and Applications    2024, 60 (4): 163-172.   DOI: 10.3778/j.issn.1002-8331.2209-0472
    Abstract52)      PDF(pc) (2723KB)(21)       Save
    Extreme multi-label text classification is a challenging task in the field of natural language processing. In this task, there is a long-tailed distribution situation of labeled data. In this situation, model has a poor ability to learn tail labels classification, which results the overall classification effect is not good. In order to address the above problems, an extreme multi-label text classification method based on balance function is proposed. Firstly, the BERT pre-training model is used for word embedding. Further, the concatenated output of the multi-layer encoder in the pre-trained model is used as the text vector representation to obtain richer text semantic information and improves the model convergence speed. Finally, the balance function is used to assign different attenuation weights to the training losses of different prediction labels, which improves the learning ability of the method on tail label classification. The experimental results on Eurlex-4K and Wiki10-31K datasets show that the evaluation indicators P@1, P@3 and P@5 respectively reach 86.95%, 74.12%, 61.43% and 88.57%, 77.46% and 67.90%.
    Reference | Related Articles | Metrics
    Few-Shot Scene Classification with Attention Mechanism in Remote Sensing
    ZHANG Duona, ZHAO Hongjia, LU Yuanyao, CUI Jian, ZHANG Baochang
    Computer Engineering and Applications    2024, 60 (4): 173-182.   DOI: 10.3778/j.issn.1002-8331.2301-0012
    Abstract57)      PDF(pc) (2555KB)(43)       Save
    Remote sensing scene classification is a hot research topic in the field of computer vision, and it is of great significance to semantic understanding of remote sensing images. At present, remote sensing scene classification methods based on deep learning occupy a dominant position in this field. However, it suffers from the lack of samples and poor model generalization ability in actual application scenarios. Therefore, this paper proposes a few-shot remote scene classification method based on attention mechanism, and designs a structure of dual-branches similarity measurement. This method is based on the meta-learning training strategy to divide the dataset into tasks. At the meantime, the input images are divided into blocks in order to preserve the feature distribution in the remote sensing image. Then the lightweight attention module is introduced into the feature extraction network to reduce the risk of overfitting and ensure the acquisition of discriminative features. Finally, based on earth mover’s distance (EMD), a dual-branches similarity measurement module is added to improve the discriminative ability of the classifier. The results show that compared with the classic small-sample learning method, the few-shot remote scene classification method proposed in this paper can significantly improve the classification performance.
    Reference | Related Articles | Metrics
    Improving Detection and Positioning of Insulators in YOLO v7
    ZHANG Jianrui, WEI Xia, ZHANG Linxuan, CHEN Yannan, LU Jie
    Computer Engineering and Applications    2024, 60 (4): 183-191.   DOI: 10.3778/j.issn.1002-8331.2306-0094
    Abstract59)      PDF(pc) (2604KB)(54)       Save
    This paper aims to address the problems of low accuracy and high leakage rate due to the influence of different insulator sizes and background interference in the target detection task of power systems. Firstly, a convolutional block attention module (CBAM) is added to the YOLO v7 backbone network to make the network model pay more attention to the insulator features from both channel and space aspects and reduce the leakage rate in insulator detection. Secondly, a concentrated feature pyramid (CFP) is added to the deeper layer of the network model to allow the information exchange and aggregation of feature maps at different scales, thus obtaining more comprehensive insulator features and improving insulator detection accuracy. Finally, the k-means algorithm is used to cluster the preselected frames to obtain the most suitable insulator preselected frame size. The experimental results show that the improved YOLO v7 network model has a detection mAP (mean average precision) of 96.2%, a precision of 90.8%, and a recall of 93.8%. The improved method in this paper has a wide application prospect in the insulator detection of power systems.
    Reference | Related Articles | Metrics
    Stock Prediction Method Combining Graph Convolution and Convolution Self-Attention
    TIAN Hongli, CUI Yao, YAN Huiqiang
    Computer Engineering and Applications    2024, 60 (4): 192-199.   DOI: 10.3778/j.issn.1002-8331.2210-0050
    Abstract50)      PDF(pc) (2208KB)(39)       Save
    With the continuous development of China??s stock market, the trend of a stock is often affected by the development of the upstream and downstream industries of its enterprises. In view of the fact that the mainstream stock prediction model ignores the shortcomings of the correlation relationship between stocks, a stock trend prediction model  fusing graph convolution and long convolution self-attention is proposed. Firstly, the relationship matrix of multiple associated stocks is calculated using the correlation coefficient, then the graph convolutional network combining relationship matrix is used to extract the feature of the associated stocks. Secondly, the multi-head convolution is used to extract long-term features from attention. Finally, the classification loss function polynomial expansion framework is used to make trend prediction for loss function optimization. Experimental results show that the proposed model is superior to gated loop unit, time convolutional network and other models in terms of accuracy, precision, recall and F1 score.
    Reference | Related Articles | Metrics
    Brain-Inspired Learning Model for EEG Diagnosis of Depression
    ZENG Haochen, HU Bin, GUAN Zhihong
    Computer Engineering and Applications    2024, 60 (3): 157-164.   DOI: 10.3778/j.issn.1002-8331.2209-0077
    Abstract25)      PDF(pc) (734KB)(31)       Save
    Depression is a global mental disease. Conventional diagnostic methods mainly depend on the scale and the subjective assessment of doctors, which cannot guarantee effective identification of symptoms and may have the risk of misdiagnosis. Using physiological signals, deep learning methods are expected to improve those diagnostic methods that lack the support of physiological basis. Traditional deep learning methods, however, rely on huge computing power, and most of them are end-to-end network learning. There also lacks physiological interpretability in those learning methods, limiting the clinical application of auxiliary diagnosis. This paper proposes a brain-inspired learning model for electroencephalogram (EEG) diagnosis of depression. At the functional level, a spiking neural network is constructed to classify depression and healthy individuals with an accuracy of more than 97.5%, which reduces the energy consumption compared to deep convolutional methods. At the structural level, the spatial topology of brain connectivity is established by using complex network and its graph characteristics are analyzed to find out the underlying mechanism of abnormal brain functional connectivity in individuals with depression.
    Reference | Related Articles | Metrics
    Correlation Filtering Target Tracking Algorithm Based on Nonlinear Spatio-Temporal Regularization
    JIANG Wentao, WANG Deqiang, ZHANG Shengchong
    Computer Engineering and Applications    2024, 60 (3): 165-176.   DOI: 10.3778/j.issn.1002-8331.2208-0409
    Abstract25)      PDF(pc) (1144KB)(24)       Save
    In order to address the problem that the tracking model tends to drift during target tracking and cannot be robustly tracked for targets with diverse morphological changes, a correlation filtering target tracking algorithm based on nonlinear spatio-temporal regularization is proposed according to the law of biological visual perception. Firstly, a temporal regularization term for nonlinear filter update that is close to the power law of human visual perception is proposed in the objective function. Compared to the fixed temporal regularization term in the STRCF (spatio-temporal regularized correlation filter), the temporal regularization term updated by the nonlinear filter can be adaptively updated according to the tracked temporal changes, and the algorithm complexity is reduced by the alternate multiplier method. Then, nonlinear HOG (histogram of oriented gradient) features are extracted and scale adaptation is performed using log-polar coordinates conforming to biological mapping. Finally, occlusion anomaly detection is performed according to the relationship between the maximum response value and average peak-to-correlation energy, which reduces the probability of model drift and enhances the anti-occlusion ability of the algorithm. The experimental results show that the accuracy and success rate of the algorithm tested on the OTB2015 dataset are 89.8% and 83.3%, respectively. Compared with STRCF, the proposed algorithm improves the accuracy rate by 2.5% and the success rate by 3.2%. In the classification comparison of 11 attributes on OTB2013 and OTB2015, the proposed algorithm has higher accuracy and stronger robustness in target tracking under the interference of rotating, low-resolution background, clutter, illumination change and other factors.
    Reference | Related Articles | Metrics
    Entity Extraction of Adverse Drug Reaction on Social Media Based on Tri-training
    HE Zhongbo, YAN Xin, XU Guangyi, ZHANG Jinpeng, DENG Zhongying
    Computer Engineering and Applications    2024, 60 (3): 177-186.   DOI: 10.3778/j.issn.1002-8331.2208-0433
    Abstract18)      PDF(pc) (678KB)(19)       Save
    Due to the real-time nature of social media data, the full use of it can make up for the delay problem of entity extraction in traditional medical literature adverse drug reaction. However, social media texts face problems such as high cost of labeling data and noise, making it difficult for the model to perform well. Aiming at the problem of high labeling cost in a large number of unlabeled corpora in social media, the Tri-training semi-supervised method is used to extract entities of adverse drug reaction. Unlabeled data are annotated by Transformer+CRF, BiLSTM+CRF and IDCNN+CRF, and then the training set is iteratively expanded by the consistency evaluation function. Finally, the output labels of model is integrated through weighted voting. Aiming at the informality of texts in social media (serious colloquialism, typos, etc.) , this paper extracts richer semantic information by merging two granularity vectors as the input of the model embedding layer. The experimental results show that the proposed model achieves good performance on the dataset obtained from the “Good Doctor Online” website.
    Reference | Related Articles | Metrics
    Lightweight Network ICA-Res2Net for Cervical Cell Classification
    ZHANG Peng, XIE Li, YANG Hailin
    Computer Engineering and Applications    2024, 60 (3): 187-195.   DOI: 10.3778/j.issn.1002-8331.2209-0014
    Abstract15)      PDF(pc) (796KB)(16)       Save
    To solve the problems of low accuracy and poor real-time property for cervical cell classification, this paper proposes an improved coordinate attention (ICA) module, and designs a lightweight deep convolution neural network ICA-Res2Net by combining with the new residual structure Res2Net and spatial pyramid pooling layer. Firstly, cross-convolution between feature sub-blocks of Res2Net network is adopted to extract finer granularity information in feature layer. Then, the spatial pyramid pooling is used to extract local regional features, thus the features can be effectively extracted without increasing the number of training parameters. The improved lightweight attention module is further introduced to weight each pixel in the feature layer through operations such as horizontal pooling and vertical pooling, so as to strengthen the important detailed features and help the network locating the objects of interest. In addition, in order to effectively prevent the degradation of deep network, the proposed ICA-Res2Net network retains the design of skip connection in residual network; and the network parameters are trained by combined the Softmax loss function with the center loss function to improve the classification accuracy. Applying the lightweight network proposed in this paper to classify cervical cell images in the SIPaKMeD public dataset, the test classification accuracy can reach 98.65%, and the training parameters of the network are much fewer than those of the classic networks such as ResNet50 and DenseNet121, which significantly improves the classification efficiency of cervical cell images.
    Reference | Related Articles | Metrics
    Implicit Sentiment Classification Model Based on Enhancement of Sentiment Features Oriented to Chinese Text
    TAN Guangpu, ZHU Guangli, WEI Siyu
    Computer Engineering and Applications    2024, 60 (3): 196-204.   DOI: 10.3778/j.issn.1002-8331.2209-0026
    Abstract27)      PDF(pc) (600KB)(28)       Save
    The semantic features in the implicit sentiment sentence cannot be deeply mined because the lack of explicit sentiment words, which inevitably affects the classification accuracy. To solve the problem, this paper proposes a implicit sentiment classification model based on the enhancement of sentiment features oriented to Chinese text, named CISC. To improve the classification accuracy, the positive and negative sentiment lexicons are constructed, and the sentiment words are embedded into the position to get sentences with an enhancement of the sentiment features. Firstly, the sentences are preprocessed to get the corresponding word sequence. Then, the sentiment words are positioned and embedded with positive and negative words respectively through self-attention. The corresponding positive and negative sentence representations are obtained respectively through hierarchical attention networks. Next, the corresponding sentence representations are input into the Bi-GRU models and AOA to get corresponding feature vectors respectively. Finally, the obtained feature vectors are input into the Softmax to get the sentiment tendency. Further, the positive sentiment probability is calculated for each sentence incorporating positive words. Similarly, the negative sentiment probability can be gotten for each sentence incorporating negative words. So the final sentiment tendency is gotten by comparing the average value between the positive and the negative sentiment probability. Experiments on the SMP-ECISA2019 public dataset show that the proposed model of CISC has a higher classification performance of Chinese implicit sentiment text compared to EBA, GGBA models.
    Reference | Related Articles | Metrics
    Light Graph Convolution Recommendation Method Based on Residual Network
    TANG Yu, WU Zhendong
    Computer Engineering and Applications    2024, 60 (3): 205-212.   DOI: 10.3778/j.issn.1002-8331.2209-0027
    Abstract26)      PDF(pc) (571KB)(28)       Save
    In order to solve the problem of imperfect message propagation links and redundant representation of final nodes in the existing recommendation model based on graph convolution network, this paper proposes a light graph convolution network recommendation model based on residual network (ResLightGCN). Firstly, the residual structure is employed to establish the message propagation network between adjacent layers of the same node, which expands the information propagation path. Secondly, the final node representation is optimized from the semantic point of view, that is, the graph convolution layer without message propagation is not considered. Finally, ResLightGCN is evaluated on four public data sets. Experimental results demonstrate that the proposed model outperforms multiple existing baseline models. Especially, the performance of ResLightGCN is improved by 16.2% and 15.8% respectively compared with the best baseline model in terms of evaluation metrics NDCG@10 on Yelp and Amazon_Books datasets.
    Reference | Related Articles | Metrics
    Cross-Model Named Entity Recognition in Pictures for Procurement Documents
    YANG Sai, LIU Xin, YU Shaowen
    Computer Engineering and Applications    2024, 60 (3): 213-219.   DOI: 10.3778/j.issn.1002-8331.2303-0045
    Abstract25)      PDF(pc) (572KB)(18)       Save
    The digital and intelligent procurement of smart supply chain can improve the efficiency of procurement and save a lot of labor costs. The procurement documents include a large number of documents such as certificates and qualifications. In view of uneven text layout and unclear scanned images, this paper designs an end-to-end cross-modal named entity recognition model O2V2BLC (OCR-Vector-Bi-LSTM-CRF) based on deep learning to detect named entities from the image. This model defines the continuous text character boundary for the characters recognized by optical character recognition technology, maps each character within the boundary into a vector, designs a bi-directional short and long term memory network (Bi-LSTM) to capture the context semantics of the character sequence within the boundary, calculates the state matrix of character, and obtains the global optimal marker sequence by constraining the character marker sequence rules by conditional random fields. The prediction error of named entities is calculated according to the training set, and the parameters of O2V2BLC model are dynamically optimized. Applying the method in this paper to images such as the qualification type of the procurement document can effectively identify the bidding unit, expert name, professional name and other named entities in the images. Compared with the conditional random field, hidden Markov algorithm and Bert-Bi-LSTM-CRF model, it improves the accuracy of entity identification and provides support for the digital and intelligent procurement of the intelligent supply chain.
    Reference | Related Articles | Metrics
    Detection of X-Ray Contraband by Adaptive and Multi-Scale Feature Fusion
    SUN Jia’ao, DONG Yishan, GUO Jingyuan, LI Mingze, LI Shuaichao, LU Shuhua
    Computer Engineering and Applications    2024, 60 (2): 96-102.   DOI: 10.3778/j.issn.1002-8331.2210-0240
    Abstract53)      PDF(pc) (809KB)(51)       Save
    To resolve the problems of spatial multi-scale variation, background interference and model complex of X-ray security inspection contraband images, a lightweight YOLOv5 model with spatial adaptation and multi-scale feature fusion is proposed. Taking YOLOv5 as the basic framework, the adaptive spatial feature fusion mechanism is introduced to suppress the influence of feature scale differences, and the bidirectional feature weighted fusion is integrated with the bidirectional feature pyramid network; the lightweight channel attention mechanism is used to obtain accurate position information and enhance the expression of effective features. Meanwhile, GhostConv is used to replace part of Conv to reduce the computational complexity of the network. This model achieves mAP of 94.2%, 92.8% and 83.3% on three public datasets such as OPIXray, SIXray and HiXray, respectively, which is 5.4, 0.5 and 1.7 percentage points higher than the baseline model. And the model training time is not significantly increased. It takes into account the accuracy and speed of model detection, which is superior to many current advanced algorithms.
    Reference | Related Articles | Metrics
    Few Samples Data Augmentation Method Based on NVAE and OB-Mix
    YANG Wei, ZHONG Mingfeng, YANG Gen, HOU Zhicheng, WANG Weijun, YUAN Hai
    Computer Engineering and Applications    2024, 60 (2): 103-112.   DOI: 10.3778/j.issn.1002-8331.2208-0326
    Abstract31)      PDF(pc) (1003KB)(23)       Save
    Due to the high dependence of deep learning models on massive labeled data, many cutting-edge target detection theories are difficult to apply to the field of industrial detection. To this end, a small-sample data augmentation method based on NVAE for image generation and OB-Mix for data regularization is proposed. The specific method is to build a data distribution model of the detection target images through NVAE, and then generate new target images that belong to the same distribution as the real target images by sampling latent variables. After the generated target images are obtained, an OB-Mix data augmentation strategy is proposed, which mixes the generated target images with the background images at random positions to construct new images data, thereby improving the localization ability and generalization ability of the network. In the case of using only 474 labeled images and 400 background images without detection targets, the detection Precision of YOLOv5 reaches 95.86%, which is 17.60 percentage points higher than the training without this method.
    Reference | Related Articles | Metrics
    Knowledge Graph Embedding Model Based on k-Order Sampling and Graph Attention Networks
    LIU Wenjie, YAO Junfei, CHEN Liang
    Computer Engineering and Applications    2024, 60 (2): 113-120.   DOI: 10.3778/j.issn.1002-8331.2208-0339
    Abstract30)      PDF(pc) (557KB)(32)       Save
    Knowledge graph embedding (KGE) aims to map entities and relations of knowledge graph into a low-dimensional space to obtain its vector representation. Existing KGE models only consider the first-order neighbors, which influence the accuracy of reasoning and prediction tasks in knowledge graph. In order to solve this problem, a novel KGE model based on [k]-order sampling algorithm and graph attention networks is proposed. Firstly, a [k]-order sampling algorithm is proposed to obtain the neighbors’ features of a central entity by aggregating [k]-order neighborhood in the pruned subgraph. Then, the graph attention networks are introduced to learn the attention values of the central entity’s neighbors, and the new entity embedding is obtained by the weighted sum of neighbors’ features. Finally, the ConvKB is used as a decoder to analyze the global embedding property of a triple. Evaluation experiments on several datasets, WN18RR, FB15k-237, NELL-995, Kinship, reveal that the model performs better than the state-of-the-art models on the task of link prediction. Besides, the influence on the model hit rate while changing order [k] or sampling coefficient [b] has been discussed.
    Reference | Related Articles | Metrics
    Enhanced Cascading Recognition with Positional Labels for Chinese Medicine Named Entity
    WANG Xuyang, ZHAO Lijie, ZHANG Jiyuan
    Computer Engineering and Applications    2024, 60 (2): 121-128.   DOI: 10.3778/j.issn.1002-8331.2208-0345
    Abstract30)      PDF(pc) (608KB)(23)       Save
    Aiming at the problems that named entity recognition methods in the general field cannot be directly used for the recognition of Chinese medical professional entities and existing related research only focuses on the recognition of medical entities in English text and flat structure, by studying the methods of named entities in medical field, and combined with the characteristics of Chinese medical entities, it proposes a cascade recognition method for Chinese medical entities. The position label of each character element relative to the entity is embedded into the model, and the fusion representation of the entity is carried out by combining the importance of different elements within the span of Chinese medical entities. Firstly, the position labels of characters are detected by the sequence labeling method, and then the position information of characters is used to guide the generation of candidate entities. Finally, the entity semantic classification is carried out. The model performs recognition experiments of flat entities, nested entities and discontinuous long entities on the CMeEE and CCKS2018 datasets and the Chinese diabetes research literature dataset, respectively. Experimental results show that the method can effectively identify entities with different structures in Chinese medical texts.
    Reference | Related Articles | Metrics
    Category Decoupled Few-Shot Classification for Graph Neural Network
    DENG Gelong, HUANG Guoheng, CHEN Ziyan
    Computer Engineering and Applications    2024, 60 (2): 129-136.   DOI: 10.3778/j.issn.1002-8331.2208-0373
    Abstract41)      PDF(pc) (581KB)(44)       Save
    Existing metric-based few-shot image classification models show some few-shot image learning performance. However, these models often ignore the extraction of key features of the original data being classified, and redundant information in the image data that is not related to classification is incorporated into the network parameters of the metric method, which easily causes a bottleneck in the performance of few-shot image classification based on metric methods. To address this problem, a category decoupled few-shot image classification model (VT-GNN) based on graph neural network is proposed, which combines image self-attention with a variational self-encoder supervised by a classification task as an embedding module to obtain information of the original image category decoupled features as a graph node in a graph structure. A set of few-shot training data is constructed as graph structure data by constructing edge features with metric information between nodes through a multilayer perceptron, and few-shot learning is achieved with the help of message passing mechanism of graph neural network. On the publicly available dataset Mini-Imagenet, VT-GNN achieves 18.10 percentage points and 16.25 percentage points performance gains relative to the baseline graph neural network model in the 5-way 1-shot and 5-way 5-shot settings, respectively.
    Reference | Related Articles | Metrics
    Multimodal Sentiment Analysis Based on Information Bottleneck
    CHENG Zichen, LI Yan, GE Jiangwei, JIU Mengfei, ZHANG Jingwei
    Computer Engineering and Applications    2024, 60 (2): 137-146.   DOI: 10.3778/j.issn.1002-8331.2207-0456
    Abstract33)      PDF(pc) (549KB)(23)       Save
    In the field of multimodal sentiment analysis, previous research mainly focused on how to interactively fuse information from different modalities. However, based on various complex fusion strategies, the generated multimodal representation vector inevitably carries a lot of noise information irrelevant to downstream tasks, which leads to a high risk of overfitting and affects the generation of high-quality prediction results. In order to solve the above problems, according to the information bottleneck theory, this paper designs a mutual information estimation module containing two mutual information estimators, aiming to optimize the lower bound of mutual information between the multimodal representation vector and the true label, while minimizing the multimodality. The mutual information between the representation vector and the input data is used to find a concise multimodal representation vector with better predictive ability. Using MOSI and MOSEI and CH-SMIS datasets to conduct comparative experiments, the results show that the method proposed in this paper is effective.
    Reference | Related Articles | Metrics
    Knowledge Graph Completion Method Based on Neighborhood Hierarchical Perception
    LIANG Meilin, DUAN Youxiang, CHANG Lunjie, SUN Qifeng
    Computer Engineering and Applications    2024, 60 (2): 147-153.   DOI: 10.3778/j.issn.1002-8331.2210-0023
    Abstract22)      PDF(pc) (642KB)(21)       Save
    Knowledge graph completion (KGC) is aiming at inferring the missing values of triples by using the existing knowledge of the knowledge graph. Recently, there are some studies showing that applying the graph convolution network (GCN) to the KGC task can improve the inference performance of the model. Currently, most GCN models have the problems of treating the neighborhood information equally, ignoring the different contributions of neighboring entities to the central entity, and using simple linear transformation to update the relationship embedding. Aiming at these problems, a neighborhood aware hierarchical attention network, named NAHAT, is proposed. In order to improve the expression ability of the model, NAHAT introduces entity feature information into relation updating, and aggregates entity and relation representation to enrich heterogeneous relation semantics. At the same time, NAHAT applies self-adversarial negative sample training to the loss calculation to train the model efficiently. Compared with composition-based multi-relational graph convolutional networks, the Hits@1 and Hits@10 metrics of the proposed model increases by 3% and 2.6% respectively on the FB15K-237 dataset, and 0.9% and 2.2% respectively on the WN18RR dataset. Experimental results demonstrate the effectiveness of the proposed model.
    Reference | Related Articles | Metrics
    Scene Text Detection Based on Multi-Scale Pooling and Bidirectional Feature Fusion
    WEI Zheliang, LI Yueyang, LUO Haichi
    Computer Engineering and Applications    2024, 60 (2): 154-161.   DOI: 10.3778/j.issn.1002-8331.2210-0077
    Abstract38)      PDF(pc) (613KB)(33)       Save
    Text has complex background, with different shapes and sizes in the natural scene. To solve this problem, a new scene text detection network based on segmentation is proposed. The network performance is improved by building two modules:multi-scale pooling and bidirectional feature fusion. According to the characteristics of text instances, the multi-scale pooling module uses spatial pooling with different aspect ratios window to capture the dependency of text information at different distances, which guides the network to obtain more accurate segmentation results. The bidirectional feature fusion module constructs two fusion paths in different directions to better utilize the different scale features of the backbone network and improve the network’s detection performance for texts of different scales. The experimental results prove the effectiveness of the proposed method. On the ICDAR2015, MSRA-TD500 and Total-Text three open data sets, 87.7%, 86.7% and 85.5% F-measure values are obtained respectively.
    Reference | Related Articles | Metrics
    Enhanced Contextual Neural Topic Model for Short Texts
    LIU Gang, WANG Tongli, TANG Hongwei, ZHAN Kai, YANG Wenli
    Computer Engineering and Applications    2024, 60 (1): 154-164.   DOI: 10.3778/j.issn.1002-8331.2212-0259
    Abstract43)      PDF(pc) (752KB)(40)       Save
    Most of the current topic models are modeled based on word co-occurrence information of their own texts, and do not introduce topic sparsity constraints to improve the model’s topic extraction ability. In addition, short texts have the problem of word co-occurrence sparsity, which seriously affects accuracy of short text topic modeling. In response to the above problems, an enhanced context neural topic model (ECNTM) is proposed. ECNTM implements sparsity constraints on the topic based on the topic controller to filter out irrelevant topics. At the same time, the input of the model becomes the splicing of BOW vector and SBERT sentence embedding. In the Gaussian decoder, the topic on the word is embedded in the embedding space. The distribution is treated as a multivariate Gaussian distribution or a Gaussian mixture distribution, which explicitly enriches the limited context information of short texts and solves the problem of sparse word co-occurrence features in short texts. Experimental results on four public datasets of WS, Reuters, KOS and 20 NewsGroups show that this model has significantly improved compared with the benchmark model in terms of perplexity, topic consistency, and text classification accuracy, which proves the introduction of topic sparsity constraints and rich effectiveness of contextual information to short text topic modeling.
    Reference | Related Articles | Metrics
    Cross-Modal Emotion Analysis of Semantic and Spatio-Temporal Dynamic Interaction
    QU Licheng, QIE Liyuan, LIU Zijun, WEI Si, DONG Zhewei
    Computer Engineering and Applications    2024, 60 (1): 165-173.   DOI: 10.3778/j.issn.1002-8331.2207-0498
    Abstract42)      PDF(pc) (814KB)(28)       Save
    Considering the problems of poor interaction between multimodal and low fusion of spatial and temporal features in traditional sentiment analysis, a semantic and spatio-temporal dynamic interaction network of cross-modal is proposed. By introducing bi-directional long short-term memory, the time series features of each modality are mined. Meanwhile, a self-attention mechanism is added to strengthen the weight distribution of features within the modality, and the automatically screened feature matrix is sent to the graph convolutional neural networks for semantic interaction. Then, based on the timestamp, the feature aggregation is carried out, the correlation coefficient of the aggregation layer is calculated, and the fused features are obtained to realize cross-modal space interaction. Finally the classification and prediction of emotional polarity are performed. The proposed model is evaluated and verified using public datasets. The experimental results show that multi-modal time series extraction and cross-modal semantic space interaction mechanism can achieve full dynamic fusion of intra-modal and inter-modal features, and effectively improve the accuracy and F1 value of sentiment classification. On the CMU-MOSEI dataset they have increased by 1.7%~13.5% and 2.1%~14.0% respectively, showing good robustness and advancement.
    Reference | Related Articles | Metrics
    Nested Named Entity Recognition Method Based on Span Decoding
    NIAN Yongming, CHEN Yanping, QIN Yongbin, HUANG Ruizhang
    Computer Engineering and Applications    2024, 60 (1): 174-181.   DOI: 10.3778/j.issn.1002-8331.2208-0293
    Abstract47)      PDF(pc) (582KB)(27)       Save
    Span classification is a popular method for nested named entity recognition but suffers from high complexity and data imbalance due to the need to exhaust and validate each span. Moreover, since the prediction is performed for each span individually, the dependencies among the entities present in the text sequence are ignored. To address the above problems of span classification methods, a nested named entity recognition method based on span decoding is proposed in the paper. First, the text is encoded by combining lexical features, character features, word features, and contextual features to obtain rich semantic information. Then, the possible entity start positions are identified, and the possible entity spans are exhausted on this basis to reduce the potential entity spans to some extent. Finally, the type of entity span corresponding to each start is predicted one by one using a decoder based on an attention mechanism. The decoding process passes the predicted entity information, and thus captures and learns the dependencies between entities. Experimental results show that span decoding can effectively improve span classification, and the proposed method improves F1 scores by 0.45 and 0.14?percentage points on the public English nested entity datasets ACE2005 and GENIA, respectively.
    Reference | Related Articles | Metrics
    Risk Identification Method for News Public Opinion Driven by Prompt Learning
    ZENG Huiling, LI Lin, LYU Siyang, HE Zheng
    Computer Engineering and Applications    2024, 60 (1): 182-188.   DOI: 10.3778/j.issn.1002-8331.2208-0295
    Abstract40)      PDF(pc) (621KB)(31)       Save
    Identifying a company’s risks from news reports can quickly locate the risk categories involved in the company, so as to help enterprises to take response measures timely. Generally speaking, news public opinion risk identification is a multi-classification task of risk labels. The deep learning method represented by BERT uses the mode of pre-training + fine-tuning, which is prominent in text classification tasks. However, there is little labeled data in the field of news and public opinion, which constitutes a small-sample machine learning problem. The new paradigm represented by prompt learning provides a new way and means to improve the performance of small sample classification, and existing studies have shown that this paradigm is superior to the pre-training + fine-tuning method in many tasks. Inspired by the existing research work, this paper proposes a news public opinion risk identification method based on prompt learning, designs a news public opinion risk prompt template based on the idea of prompt learning on the basis of the BERT pre-training model, and after training by the MLM (masked language model) model, the predicted label is mapped to the existing risk label through answer engineering. The experimental results show that the training method of prompt learning is better than the training method of fine-tuning on different numbers of small samples of the news public opinion datasets.
    Reference | Related Articles | Metrics
    Time Series Classification Method with Local Attention Enhancement
    LI Kewen, KE Cuihong, ZHANG Min, WANG Xiaohui, GENG Wenliang
    Computer Engineering and Applications    2024, 60 (1): 189-197.   DOI: 10.3778/j.issn.1002-8331.2207-0444
    Abstract39)      PDF(pc) (653KB)(33)       Save
    Existing time series classification methods are generally based on a circular network structure to solve the point value coupling problem of time series, which cannot be computed in parallel, resulting in a waste of computing resources. Therefore, this paper proposes a time series classification method with local attention enhancement. The mixed distance information is fitted to increase the position information perception of time series, the mixed distance information is incorporated into the self-attention matrix calculation to expand the self-attention mechanism. Multi-scale convolution attention is constructed to obtain multi-scale local forward information to solve the attention confusion problem in point value calculation of standard self-attention mechanism. The improved self-attention mechanism is used to construct the sequential self-attention classification module, and the time series classification task is processed by parallel computation. The experimental results show that, compared with the existing time series classification methods, the time series classification method based on local attention enhancement can accelerate convergence and effectively improve the classification effect of time series.
    Reference | Related Articles | Metrics
    Real-Time Cross-Camera Vehicle Tracking Method for Tunnel Scenes by Fusing Spatiotemporal Features
    GOU Lingtao, SONG Huansheng, ZHANG Zhaoyang, WEN Ya, LIU Lichen, SUN Shijie
    Computer Engineering and Applications    2023, 59 (24): 88-97.   DOI: 10.3778/j.issn.1002-8331.2307-0017
    Abstract84)      PDF(pc) (827KB)(77)       Save
    Cross-camera vehicle tracking is of great significance for realizing intelligent transportation. In the tunnel scene, the existing target re-identification scheme is difficult to meet the requirements of vehicle tracking accuracy and real-time in practical applications due to the influence of factors such as low environmental illumination and similar characteristics of the same type of vehicles. A cross-camera multi-target tracking method is proposed considering the vehicle type and spatiotemporal characteristics of vehicles in tunnel traffic scenarios. Firstly, the normalized attention module (NAM) is added to the YOLOv7 target detection model to make the model more focused on the region of interest, and combining with the camera calibration method, the vehicle position coordinates are obtained in real space. Secondly, the target position prediction is combined with vehicle velocity based on Kalman filtering, secondary correlation strategy(BYTE) is applied to complete single camera vehicle tracking, and interval frame method is used to improve the tracking speed. Finally, the cross-camera target matching cost matrix based on vehicle type and spatiotemporal characteristics is proposed, and the Hungarian algorithm is used to complete vehicle target matching, so as to realize cross-camera vehicle target tracking and generate the vehicle target spatiotemporal map of the tunnel scene. The experimental results on the cross-camera vehicle target tracking dataset constructed show that the tracking accuracy reaches 82.1%, the overall speed of detection and tracking reaches 115 frames per second, the accuracy of cross-camera target matching reaches 94.9%, and the tracking speed and accuracy are better than other methods.
    Reference | Related Articles | Metrics
    Palm Vein Recognition Network Combining Transformer and CNN
    WU Kai, SHEN Wenzhong, JIA Dingding, LIANG Juan
    Computer Engineering and Applications    2023, 59 (24): 98-109.   DOI: 10.3778/j.issn.1002-8331.2208-0086
    Abstract120)      PDF(pc) (718KB)(133)       Save
    Aiming at the low accuracy of palm vein feature extraction and recognition, it proposes a palm vein recognition network PVCodeNet. It designs an improved BasicBlock and Transformer Encoder, and uses AAM-loss(additional angular margin loss) to expand decision boundary. It successfully applies Transformer Encoder to global feature extraction of palm vein firstly. Improved BasicBlock uses Do-Conv to replace Conv for feature extraction, it makes extracted features more distinctive. it also adds standardized attention module NAM, its detailed features of in channel and spatial domain are extracted by applying heavy sparsity penalty to suppress weights of insignificant features. This paper describes in detail the key point location, ROI extraction and image enhancement, then makes detailed experiments on feature vector dimension and AAM-loss parameter setting. Finally, ablation experiments are carried out on PolyU database and selfbuilt database SEPAD-PV, EER reaches 0, it achieves a breakthrough in the highest recognition rate. In order to verify the generalization performance of network, it is also verifies on the palmprint database Tongji and the finger vein database SDUMLA with similar texture features. EER is far superior to other mainstream algorithms, which fully proves the superiority of this algorithm.
    Reference | Related Articles | Metrics
    Multi-Scale End-to-End Speaker Recognition System Based on Improved Res2Net
    DENG Lihong, DENG Fei, ZHANG Gexiang, YANG Qiang
    Computer Engineering and Applications    2023, 59 (24): 110-120.   DOI: 10.3778/j.issn.1002-8331.2208-0085
    Abstract67)      PDF(pc) (1412KB)(65)       Save
    The feature extraction ability of lightweight convolutional neural networks in speaker recognition systems is weak and recognition is poor. And to improve the feature extraction ability, many methods use deeper, wider and more complex network structures, which make the number of parameters and inference time increase exponentially. This paper introduces Res2Net in target detection task to speaker recognition task, and verifies its effectiveness and robustness in speaker recognition task. And FullRes2Net is improved and proposed to have stronger multi-scale feature extraction capability without increasing the number of parameters, and 17% performance improvement compared to Res2Net. Meanwhile, in order to solve the problems of existing attention methods improve the shortcomings of convolution itself and further enhance the feature extraction ability of convolutional neural networks, mixed time-frequency channel attention is proposed. Experiments are conducted on the Voxceleb dataset, and the results show that the proposed method effectively improves the feature extraction ability and generalization ability of the system, with a 34% performance improvement compared to Res2Net, and outperforms advanced speaker recognition systems using complex structures, which is an end-to-end structure with fewer parameters and higher efficiency, suitable for applications in realistic scenarios.
    Reference | Related Articles | Metrics
    Agglutinative Languages Named Entity Recognition Based on Pruner and Multilingual Fine-Tuning
    LUO Kai’ang, Abudukelimu Halidanmu, LIU Chang, Abudukelimu Abulizi, GUO Wenqiang
    Computer Engineering and Applications    2023, 59 (24): 121-130.   DOI: 10.3778/j.issn.1002-8331.2208-0109
    Abstract35)      PDF(pc) (792KB)(28)       Save
    Minority languages, represented by Uyghur, are characterized by agglutination and lack resources, which pose great challenges for their named entity recognition tasks. Meanwhile, the multilingual model suffers from problems such as large parameter scale, large word vocabularies, and slow inference speed. In order to explore the best fine-tuning strategy to alleviate the low-resource problem, monolingual and multilingual fine-tuning are performed for five agglutinative languages, namely Uyghur, Kazakh, Kirghiz, Uzbek, and Tatar, respectively. The experimental results show that CINO-Agglu reduces the model size, number of parameters, word list size, and inference time by 45%, 44%, 92%, and 38%, respectively, compared with the period before pruning, and the average F1 score on the five languages is 85.9%, which exceeds all baseline models. The inclusion of appropriately sized data from the same language branch is beneficial to enhance the fine-tuning effect.
    Reference | Related Articles | Metrics
    Optimization Algorithm of Elite Pool Dwarf Mongoose Based on Lens Imaging Reverse Learning
    JIA Heming, CHEN Lizhen, LI Shanglong, LIU Qingxin, WU Di, LU Chenghao
    Computer Engineering and Applications    2023, 59 (24): 131-139.   DOI: 10.3778/j.issn.1002-8331.2208-0291
    Abstract40)      PDF(pc) (804KB)(37)       Save
    Dwarf mongoose optimization(DMO) is a newly proposed meta heuristic algorithm. The algorithm has strong global exploration ability and stability. However, due to the fact that only female leader is used in the original algorithm to lead the whole mongoose population to search, there will be some problems, such as slow convergence speed, easy to fall into local optimization and poor balance between exploration stage and exploitation stage. To solve the above problems, this paper proposes an improved dwarf mongoose optimization(IDMO). Firstly, the lens imaging reverse learning strategy is adopted to avoid the algorithm falling into local optimization in the iterative process and enhance the exploration ability of the algorithm. Then the elite pool strategy is introduced into the Alpha group foraging, which improves the convergence accuracy of the algorithm and further enhances the exploration ability of the algorithm. Experiments with benchmark function show that IDMO has good optimization performance and robustness, and the convergence speed of the algorithm is significantly improved. Finally, by solving the car crash worthiness optimization problem, it is further verifies that the IDMO algorithm has good applicability and effectiveness.
    Reference | Related Articles | Metrics
    Classified Convolutional Neural Networks for Sparse Point Clouds Regularization Disposing
    LI Hengyu, YANG Jiazhi, SHEN Jie, ZHANG Junkai
    Computer Engineering and Applications    2023, 59 (24): 140-146.   DOI: 10.3778/j.issn.1002-8331.2209-0198
    Abstract40)      PDF(pc) (683KB)(30)       Save
    As one of the important methods of point cloud classification, deep learning usually fails to fully extract local spatial correlation due to the sparsity, disorder and limitation of point cloud. Directly using convolution to extract relevant features of points will lead to the loss of feature information. To this end, this paper proposes a convolutional neural network based on X-transform (XTNet) for point cloud classification. Firstly, XTNet performs X-transform on the input original point cloud data and replaces them into a potential canonical order, which suppresses the influence of point cloud disorder and sparsity on convolution operation and avoids information loss during convolution operation. Then, the [K] nearest neighbor algorithm is used to construct the local region, and the convolution layer is used to extract the local information. Secondly, while extracting local features, channel expansion is used to increase information transmission and enrich features. Finally, skip connections are set between each local feature extraction module to further reduce the loss of local information. In this paper, experiments are carried out in the standard public dataset ModelNet40 and the real dataset ScanObjectNN. Experimental results show that compared with the current mainstream multiple high-performance networks, the classification accuracy of XTNet is improved by 0.3~4 percentage points, and it has good robustness and universality.
    Reference | Related Articles | Metrics
    Code Quality Analysis Based on Event Graph in User Reviews
    ZHANG Peiyuan, JIANG Ying
    Computer Engineering and Applications    2023, 59 (24): 147-154.   DOI: 10.3778/j.issn.1002-8331.2209-0442
    Abstract32)      PDF(pc) (526KB)(23)       Save
    Research on user comments in the code hosting platform shows that the code quality information reflected in user comments can help users quickly select open source code that meets their needs, and can help software developers improve code quality. However, in view of the problems of incomplete and inaccurate extraction of code quality information in current research, a code quality analysis method based on event graph is proposed to analyze the code quality information in user comments. Firstly, a code quality hierarchy diagram is constructed to represent the various code quality information structures. Then this paper analyzes user comments and builds an event map for code user comments. Secondly, the method of mapping event map to code quality hierarchy diagram is proposed. Finally, the code quality information in the code quality hierarchy diagram is identified. The experimental results show that the average accuracy of this method in identifying code quality information in code review texts is 86.9%, so this method can effectively identify and analyze code quality information.
    Reference | Related Articles | Metrics
    Research on Visualization Method of Content and Behavior Sequence for Drug-Related Cases
    SHAN Zhihua, HUANG Ruizhang, DUAN Xihui, CHEN Yanping, QIN Yongbin
    Computer Engineering and Applications    2023, 59 (23): 86-94.   DOI: 10.3778/j.issn.1002-8331.2208-0294
    Abstract52)      PDF(pc) (876KB)(54)       Save
    The purpose of case visualization of drug-related cases is to show the development process of the case from different levels of detail through visualization technology and quickly understand the case. The traditional text visualization methods mostly extract text features for visualization, which will lose a large amount of important semantic information in the text and is not suitable for the visualization of drug-related cases. The visualization of the behavior sequence of drug-related cases is to display the laws of the criminal behavior through the pattern mining and sequence visualization of the behavior sequence. It makes up for the defect that it takes a lot of time and experience to manually discover the behavior rules of criminals from cases. In view of the above problems, a case text and behavior sequence visualization method is proposed. For a single case text, the method takes “sequential relationship, primary and secondary relationship” as the core idea, and constructs a case description diagram; for multiple case texts, this method extracts the behavior of the criminal in the case to construct a sequence, constructs a similar node tree to reduce the difference between the sequences, and mines and visualizes the sequence patterns. The method is applied to the data set of drug-related judicial cases provided by the Higher People’s Court of Guizhou Province, and the results of user interviews show that the method is effective. It provides practical ideas and methods for judicial and public security personnel to intuitively understand the contents of cases and explore the behavior patterns of criminals.
    Reference | Related Articles | Metrics
    Facial Expression Recognition Based on Attention Mechanism and Involution
    GUO Jingyuan, DONG Yishan, LIU Xiaowen, LU Shuhua
    Computer Engineering and Applications    2023, 59 (23): 95-103.   DOI: 10.3778/j.issn.1002-8331.2207-0412
    Abstract63)      PDF(pc) (642KB)(81)       Save
    To solve the problems such as background interference and unbalanced spatial information distribution in complex facial expression recognition, this paper proposes a facial expression recognition network improved by the attention mechanism and Involution operator. Using VGG19 as baseline, it introduces the attention mechanism in the front to extract vital features of facial expressions and suppress background interference. The joint normalization strategies are employed to balance the distribution of feature data to improve the training quality of the model. In the back end, dense connection has been utilized to strengthen effective feature reuse and extract deeper semantic information. The proposed network has been validated on three public datasets CK+, FER2013 and RAF-DB, achieving a significant improvement in the accuracy. The proposed model outperforms some state-of-the-art methods. In addition, in order to improve the ability of the network to process datasets of complex condition, the Involution operator is introduced at the back end to replace part of convolution operators, which enhances the perception ability of spatial diversity information. Experimental results on complex datasets such as RAF-DB validate that the proposed model can effectively improve the accuracy of facial expression recognition.
    Reference | Related Articles | Metrics
    Self-Attention Subtype Recognition Neural Network for Classification of Kidney Renal Clear Cell Carcinoma Data
    LI Yang, CHEN Xicheng, WU Yazhou
    Computer Engineering and Applications    2023, 59 (23): 104-113.   DOI: 10.3778/j.issn.1002-8331.2304-0280
    Abstract29)      PDF(pc) (702KB)(33)       Save
    To analyse transcriptomic data of kidney renal clear cell carcinoma(KIRC), a modified classification model is constructed using the self-attention mechanism. A new self-attention subtype recognition neural network(SSRNN) is constructed, which includes an encoder and classifier and takes the self-attention mechanism as the main improvement. After screening 358 survival-related protein-coding genes, three subtypes are identified by cluster analysis. Comparison of clinical information and survival analysis for the three cancer subtypes of C1, C2 and C3 reveales differences in survival outcomes among the groups. SSRNN achieves the best classification performance, achieving an area under the curve of 93.44%. Gene expression heatmaps reveale differences in gene expression among the three subtypes, suggesting that low gene expression indicates better survival prognosis. By analysing the differences among the three subtypes and drawing the volcano map, 266 differentially expressed genes are obtained. GO and KEGG enrichment analyses and node mapping are helpful to reveal cancer-related functions and pathways. Therefore, the SSRNN has high prediction accuracy and robustness, can effectively use omics data to predict the survival of KIRC patients and screen reasonable biomarkers, and has high methodological significance and application value.
    Reference | Related Articles | Metrics
    Vehicle Re-Identification Based on Dual Attention and Exact Feature Distribution Matching
    XU Yan, PAN Xuguang, GUO Xiaoyan, LIU Xianglan
    Computer Engineering and Applications    2023, 59 (23): 114-124.   DOI: 10.3778/j.issn.1002-8331.2305-0458
    Abstract53)      PDF(pc) (925KB)(55)       Save
    In order to solve the problems of weak fine-grained feature extraction and poor domain generalization of current vehicle re-identification(Re-ID) methods, a vehicle Re-ID method based on dual attention and exact feature distribution matching is proposed. A new dual attention mechanism is proposed, using WideResNet50 with a dual attention module to construct the front part of a multi-fine grained feature extraction network for efficiency modelling global contextual information and enhancing the extraction capability of fine-grained features of vehicles. A style transfer strategy based on exact feature distribution matching is incorporated into the shallow backbone to enhance the domain diversity of the source domain and achieve data augmentation, thus effectively improving the cross-domain performance and feature representation capability of vehicle Re-ID. A depth-by-depth multi-scale feature pyramid structure is designed to enhance feature extraction, integrate multi-level information from different scale feature layers, and adopt the idea of feature map segmentation for the vehicle features output from this structure to highlight local fine-grained information and enhance the sensitivity of the model to vehicle fine-grained information. Tuplet margin loss is introduced to alleviate the overfitting problem of the most difficult samples. Experimental results on two large benchmark vehicle datasets, VeRi-776 and VehicleID, show that the proposed algorithm has good Re-ID results on both single-domain and cross-domain tasks.
    Reference | Related Articles | Metrics