Content of Pattern Recognition and Artificial Intelligence in our journal

        Published in last 1 year |  In last 2 years |  In last 3 years |  All
    Please wait a minute...
    For Selected: Toggle Thumbnails
    Correlation Filter for Object Tracking Method Based on Spare Representation
    SHE Xiangyang, LUO Jiaqi, REN Haiqing, CAI Yuanqiang
    Computer Engineering and Applications    2023, 59 (11): 71-79.   DOI: 10.3778/j.issn.1002-8331.2202-0099
    Abstract21)      PDF(pc) (1105KB)(9)       Save
    Aiming at the problem that the object tracking methods based on correlation filter is easily affected by the distractive features in complex scenes such as object deformation and background interference, which leads to the tracking failure, a correlation filter for object tracking method based on sparse representation is proposed. The method combines correlation filter with sparse representation by using L1 norm to sparse constrain the correlation filter in the objective function, so that the trained correlation filter only contains the key features of the object. At the same time, different penalty parameters are assigned to the correlation filter coefficients according to spatial position of the correlation filter coefficients, and the alternating direction method of multipliers(ADMM) is used to solve the correlation filter. The experimental results show that:the method has the best precision and success rate in comparison with five object tracking methods based on correlation filter on three commonly used datasets. At the same time, the method has good robustness to the distractive features in complex scenes, and can meet the real-time requirements.
    Reference | Related Articles | Metrics
    Label-Conditional Neural Topic Model for Semantic Analysis of Short Texts
    WANG Yuan, YAN Yanling, XU Maoling, HU Peng, ZHAO Tingting, YANG Jucheng
    Computer Engineering and Applications    2023, 59 (11): 80-87.   DOI: 10.3778/j.issn.1002-8331.2206-0328
    Abstract17)      PDF(pc) (601KB)(6)       Save
    Neural topic models in unsupervised machine learning methods have been widely used to automatically mine the text for latent semantics. However, the limited length of short text and the scarcity of information available for inference in the text makes it difficult for the model to correctly identify ambiguous words with insufficient context. Therefore, a label-conditional neural topic model for semantic analysis of short texts is proposed. The model adopts a variational auto-encoder architecture, which introduces the label information of the text as a semantic identifier at the topic category level on the topic distribution of the encoder output to guide the model to filter words that are not semantically relevant to the current topic, condense the semantics, and identify the exact word meanings of ambiguous words in the topic context to guide the model to infer discrete consistent topic. To address the data characteristics of statistically significant bias of topic semantic distribution during the application of short texts, PolyLoss is introduced in the model training process, and the imbalance of short text category distribution is modeled by adjusting Taylor polynomial coefficients. The experimental results show that the model can not only greatly improve the quality of short-text topic modeling, and generate coherent and diverse topics, but also effectively improve the performance of downstream tasks.
    Reference | Related Articles | Metrics
    Facial Expression Recognition Method Embedded with Attention Mechanism Residual Network
    ZHONG Rui, JIANG Bin, LI Nanxing, CUI Xiaomei
    Computer Engineering and Applications    2023, 59 (11): 88-97.   DOI: 10.3778/j.issn.1002-8331.2207-0315
    Abstract22)      PDF(pc) (852KB)(14)       Save
    Aiming at the problems that face images in uncontrollable environments are susceptible to complex factors such as illumination and pose changes, which in turn cause low face detection rate and poor expression recognition accuracy in face expression recognition, an expression recognition method with an embedded attention mechanism residual network is proposed. In the stage of face detection, the improved RetinaFace algorithm is used to complete multi-view face detection and obtain the face region. In the stage of feature extraction, ResNet-50 is used as the backbone network for feature extraction. Firstly, the pre-processed face images are sequentially passed through the channel attention network and spatial attention network of this model to explicitly model the global image interdependence. Secondly, in the shortcut connection of the dashed residual cells, an average ensemble layer is added for the downsampling operation. By fine-tuning the operation of the residual module, the mapping between the input features is enhanced, so that the extracted expression features can be passed between the networks more completely, so as to reduce the loss of feature information. Finally, the convolutional block attention module(CBAM) attention mechanism module is passed into the network again to enhance the channel dimension information and spatial dimension information of local expression features, strengthen the focus information of feature regions with high relevance to expressions in the feature map, and suppress the interference of irrelevant regions in the feature map, thus speeding up the convergence speed of the network and improving the expression recognition rate. Compared with the baseline algorithm, this method achieves 87.65% and 73.57% accuracy on the RAF-DB and FER2013 expression datasets, respectively.
    Reference | Related Articles | Metrics
    Meta-Learning Method of Uyghur Morphological Segmentation
    ZHANG Yuning, LI Wenzhuo, Abudukelimu Halidanmu, Abulizi Abudukelimu
    Computer Engineering and Applications    2023, 59 (11): 98-104.   DOI: 10.3778/j.issn.1002-8331.2201-0087
    Abstract15)      PDF(pc) (576KB)(7)       Save
    With the development of deep learning, the accuracy of Uyghur morphological segmentation has been dramatically improved, but the demand for data volume is high, while meta-learning method can effectively alleviate the model’s reliance on data volume by learning from previous tasks, and is widely used in low-resource domains. Therefore, the meta-learning method of Uyghur morphological segmentation is proposed, which focuses on fast generalisation on new tasks by training on previous tasks and obtaining a set of parameters with the ability to quickly adapt to new tasks. The experiments are first constructed with N pseudo-meta-learning tasks based on the similarity of the data for the partitioning of meta-learning support sets and query sets. Afterwards, the Uyghur data is encoded using Transformer’s encoder. Finally, the meta-learning method is used to achieve morphological segmentation for Uyghur language in few shot environments. Experimental results show the meta-learning method outperforms the pre-trained model in the few shot task, effectively avoiding overfitting of the model and mitigating the impact of data sparsity on the model.
    Reference | Related Articles | Metrics
    Traffic Sign Recognition Algorithm Based on Siamese Neural Network with Encoder
    LYU Binglue, XI Zhenghao, SHAO Yuchao
    Computer Engineering and Applications    2023, 59 (11): 105-111.   DOI: 10.3778/j.issn.1002-8331.2201-0408
    Abstract18)      PDF(pc) (555KB)(6)       Save
    Traffic sign recognition has been applied to the assistant driving system. However, some factors, such as occlusion, contamination damage and weather can seriously affect the accuracy and robustness of traffic sign recognition function. To solve this problem, a traffic sign encoding and classification method based on the Siamese neural network is developed. The method treats the traffic sign recognition problem as a convolutional feature code recognition problem. Firstly, the method uses convolutional neural network to extract and encode features of training samples and reference samples. Secondly, the method uses Siamese neural network to compare the feature code of training samples and reference samples and trains the encoder with contrastive loss. With the help of a fully connected layer, the method can recombine and classify the convolutional feature code of input in the end. The experimental results show that this method can produce effective and robust feature code of traffic sign under motion blur and occlusion conditions. Compared to other advanced methods, this method has higher accuracy.
    Reference | Related Articles | Metrics
    Target-IoU Loss:Foreground-Aware Regression Loss with Asymmetric Strategy
    SHAO Rong, CHEN Dongfang, WANG Xiaofeng
    Computer Engineering and Applications    2023, 59 (11): 112-118.   DOI: 10.3778/j.issn.1002-8331.2202-0005
    Abstract14)      PDF(pc) (585KB)(3)       Save
    The regression loss function is one of the important components in the object detection networks. In the existing regression loss, whether the L-norm loss or the IoU-based loss, a symmetrical strategy is used to process the two bounding boxes of the input, which makes their use of foreground information insufficient and affects the quality of the regression. To this end, this paper proposes an asymmetric strategy to enhance the role of foreground information in the regression loss, under the guidance of this strategy, a TIoU(Target-IoU) loss is designed to ensure that the network has a full use of the characteristics in the ground-truth, makes the regression of bounding boxes closer to the real value. Experimental results show that the accuracy of TIoU loss is improved by 0.2 percentage points and 0.5?percentage points under the frameworks of Faster R-CNN and RetinaNet respectively, the data set used in the experiments is PASCAL VOC.
    Reference | Related Articles | Metrics
    Text Classification Model Based on Statistical Causality and Optimal Transmission#br#
    NIE Ting, XING Kai, LI Jingjuan
    Computer Engineering and Applications    2023, 59 (11): 119-130.   DOI: 10.3778/j.issn.1002-8331.2202-0140
    Abstract16)      PDF(pc) (739KB)(5)       Save
    In recent years, with the improvement of data scale and computing power, pre-training models such as CNN and BERT have made rapid progress in the field of text classification. However, these models have poor ability to extract distribution features and poor generalization performance in small-sample scenarios. At present, to address this problem, the common practice is to improve the structure of the model or expand the training data set to improve the performance. However, these methods rely on a large number of data sets and a large amount of computing power to prune the network structure. A pre-training model optimization method based on granger causality test and optimal transmission distance is proposed. From the perspective of data distribution, a feature pathway structure that can stably extract distribution information in the pre-training model is generated. On this basis, the optimal combination of characteristic path structures is given based on the optimal transmission distance, and a multi-view structured representation with stability in statistical distribution is generated. Theoretical analysis and experimental results show that this method greatly reduces the data and computing power requirements in the process of model optimization. The results show that compared with the pre-training model based on the convolution structure such as CNN, there are 5, 7 and 2?percentage points improvement in the 20ng news, Ohsumed, R8 data sets respectively;compared with the pre-training model based on the Transformer structure such as BERT, there are 2, 3 and 2?percentage points improvement respectively.
    Reference | Related Articles | Metrics
    Graph Representation Learning Model for Multi-Level Feature Augmentation
    FENG Yao, KONG Bing, ZHOU Lihua, BAO Chongming, WANG Chongyun
    Computer Engineering and Applications    2023, 59 (11): 131-140.   DOI: 10.3778/j.issn.1002-8331.2202-0229
    Abstract16)      PDF(pc) (1253KB)(4)       Save
    Representation learning based on graph data has shown significant value for graph downstream tasks such as recommendation system and link prediction. However, current methods have some drawbacks:the fixed propagation of graph neural network limits the semantic expression of node representations, and encoder-decoder architecture with regularized reconstruction is prevented from learning differentiated features between nodes, which may lead to node representations not well suited to some graph downstream tasks. Therefore, a multi-level feature augmented graph representation learning model has been proposed via mutual information maximization, which is capable of learning high quality node representations in an unsupervised manner. This model first uses an extractor to preserve distinguishable features contained in original attributes, which are then fed to an aggregator to maintain the local relevance and global difference of nodes in encoding space. Finally, the strategy of deep graph infomax is applied to unify the global encoding rules. Experimental results demonstrate that the encoding performance of the model completely outperforms all mainstream comparative baselines on several classification benchmark datasets for transductive and inductive learning.
    Reference | Related Articles | Metrics
    Low-Cost Self Evolving Learner Portrait Model
    GE Di, WU Yanwen, LIU Sanya
    Computer Engineering and Applications    2023, 59 (11): 141-150.   DOI: 10.3778/j.issn.1002-8331.2202-0240
    Abstract13)      PDF(pc) (827KB)(3)       Save
    In the current complex teaching interactive environment of AI, aiming at the dimension disaster and high data updating cost faced by the learner portrait model, this paper proposes a new model—low-cost self evolutionary learner portrait under large-scale data(LSLP). This method first improves the traditional deep nonnegative matrix decomposition algorithm, so as to preserve the feature structure of the original data from the double space, effectively reduce the dimension and suppress the dimension disaster. Then, taking the graph neural network as the information capture medium, combined with the depth neural network to quantify the meta attribute state value, an adaptive feature extraction and dynamic update strategy is designed to assist the learner portrait model to evolve continuously. Finally, four experiments are designed on the data set of Stanford EDX platform to verify the performance of this model. The experimental results show that this model can reduce the cost of updating data by 45% with 93.13% accuracy of downstream teaching recommendation tasks.
    Reference | Related Articles | Metrics
    Research on Feature Misalignment Between Tasks in Anchor-Free Models
    HAO Shuaizheng, LIU Hongzhe
    Computer Engineering and Applications    2023, 59 (11): 151-159.   DOI: 10.3778/j.issn.1002-8331.2202-0260
    Abstract14)      PDF(pc) (646KB)(3)       Save
    General object detection models consist of classification and regression branches. Due to different task drivers, they have a different sensibility to the features from the exact instances. That causes a vast performance gap, the so-called task-feature misalignment problem. Based on the assumption that the candidate result with high classification confidence has a high regression quality, the standard prediction method employs only the classification score as the criterion in NMS procedures. That leads to many prediction results with high classification scores but poor regression qualities. This paper mainly researches the misalignment problem in modern anchor-free detection models, specifically decomposing the problem with scale and spatial misalignment. It proposes to resolve the problem at minimal cost-a minor modification of the head network, which tweaks the receptive field of two tasks individually, and a new label assignment method mining the most aligned feature samples. The experiments show that, compared to the baseline FCOS, a one-stage anchor-free object detection model, the model consistently gets around 3 AP improvements with different backbones, demonstrating the method’s simplicity and efficiency.
    Reference | Related Articles | Metrics
    Cross-Subject ERP Detection Based on Graph and Dual Attention Mechanism
    XIANG Xiaojia, LAN Zhen, YAN Chao, LI Zixing, TANG Dengqing, ZHOU Han
    Computer Engineering and Applications    2023, 59 (11): 160-167.   DOI: 10.3778/j.issn.1002-8331.2202-0311
    Abstract13)      PDF(pc) (672KB)(3)       Save
    In order to improve the detection accuracy of event-related potential(ERP) in subject-independent scenarios, a convolutional recurrent neural network model based on graph embedding and dual attention mechanisms is proposed. The model uses a graph to represent the spatial information in electroencephalogram(EEG) signals, and uses the cascade framework of convolutional neural network(CNN) and long short-term memory network(LSTM) as the basic framework. By embedding dual attention mechanisms(i.e., selective kernel convolution and self-attention mechanism), it can fully extract the temporal and spatial features of EEG signals of different subjects, so as to improve the ERP detection accuracy in subject-independent scenarios. A large number of experiments carried out on the benchmark dataset based on rapid serial visual presentation paradigm demonstrate that the proposed method has significant superiority over 7 existing ERP detection methods in subject-independent scenarios.
    Reference | Related Articles | Metrics
    Improved Feature Selection for Marine Predator Algorithms
    LI Shouyu, HE Qing
    Computer Engineering and Applications    2023, 59 (11): 168-179.   DOI: 10.3778/j.issn.1002-8331.2203-0012
    Abstract16)      PDF(pc) (787KB)(6)       Save
    In view of the low classification accuracy of traditional K-nearest neighbor method for data classification, this paper combines feature selection with KNN classification method and uses improved marine predator algorithm to optimize data features. Firstly, domain learning is used to provide rich neighborhood location information to expand the search range of marine predators, the dimensional variation mechanism is introduced to increase population diversity and avoid falling into local optimum too early, and the sines and cosines disturbance operator and jump step control factor are used to update the location of predators, so as to strengthen the global search and local search capabilities. Secondly, the feature selection object is taken as the optimization object to obtain the selected optimal feature subset. Finally, through to the 14 classic test function optimization test and 14 groups of classic data set classification study, in optimizing the performance, the average number of feature subset and the average classification accuracy comparison research, experimental results show that the proposed algorithm can effectively reduce the redundant features, realize the characteristics of purification, has a broad application prospect in data mining.
    Reference | Related Articles | Metrics
    Image-Text Fusion Sentiment Analysis Method Based on Image Semantic Translation
    HUANG Jian, WANG Ying
    Computer Engineering and Applications    2023, 59 (11): 180-187.   DOI: 10.3778/j.issn.1002-8331.2203-0036
    Abstract16)      PDF(pc) (671KB)(3)       Save
    In multimodal sentiment analysis, images will generate different emotions under different circumstances or at different attention points. In order to solve problems related to image semantic understanding, it proposes a method for  image-text fusion of sentiment analysis based on image semantic translation(ImaText-IST). For a start, images are transmitted to image translation module to translate them into image captions. The module is integrated with different emotional expressions to capture image captions and generate image captions based on such three emotional polarities as positive, neutral and negative. Then, emotional correlation analysis is conducted based on the texts in the image captions at the aforesaid three emotional polarities as well as datasets to improve the accuracy of image semantic understanding. At last, sentiment prediction is performed based on image semantic captions, targets and texts, and sentiment analysis is conducted with feature fusion and auxiliary sentences. The results show that auxiliary sentences(Axu-ImaText-IST) can better understand the emotions of images and texts. The accuracy and Macro-F1 of social media datasets Twitter-15 and Twitter-17 are both higher than that of the benchmark model.
    Reference | Related Articles | Metrics
    Multi-Modal Meteorological Forecasting Based on Transformer
    XIANG Deping, ZHANG Pu, XIANG Shiming, PAN Chunhong
    Computer Engineering and Applications    2023, 59 (10): 94-103.   DOI: 10.3778/j.issn.1002-8331.2208-0486
    Abstract62)      PDF(pc) (977KB)(88)       Save
    Thanks to the rapid development of meteorological observation technology, the meteorological industry has accumulated massive meteorological data, which provides an opportunity to build new data-driven meteorological forecasting methods. Due to the long-term dependence and large-scale spatial correlation hidden in meteorological data, and due to the complex coupling relationship between different modalities, meteorological forecasting with deep learning is still a challenging research topic. This paper presents a deep learning model for meteorological forecasting based on multi-modal fusion, using sequential multi-modal data in same atmospheric pressure levels composed of four classical meteorological elements:temperature, relative humidity, U-component of wind and V-component of wind. Specifically, convolutional network is used to learn features from every modality, and with those features, the gating mechanism is introduced to multi-modal weighted fusion. Secondly, the attention mechanism is introduced, which replaces the traditional attention mechanism with parallel spatial-temporal axial attention, in order to effectively learn long-term dependencies and large-scale spatial associations. Architecturally, the Transformer encoder-decoder structure is employed as the overall framework. Extensive comparative experiments have been conducted on the regional ERA5 reanalysis dataset, demonstrating that the proposed method is effective and superior in the prediction of temperature, relative humidity and wind.
    Reference | Related Articles | Metrics
    Compound Convolutional and Self-Attention Network for Session-Based Recommendation
    XIAO Yan, HUO Lin
    Computer Engineering and Applications    2023, 59 (10): 104-113.   DOI: 10.3778/j.issn.1002-8331.2202-0128
    Abstract35)      PDF(pc) (748KB)(39)       Save
    Recently, methods based on convolutional neural networks(CNN) have shown potential in modeling conversational data, especially in extracting complex local behavioral interactions of users. Recurrent neural networks(RNNs) have difficulty for learning item dependencies from a distance, while self-attention(SA) structures have the ability to model the sequence of conversational events and capture interactions between distant items, which is a superior choice for session data. Therefore, a compound convolutional and self-attention network(CCNN-SA) architecture is proposed, which utilizes the complex local features extracted by two convolutional modules, and uses a multi-head self-attention structure to learn long-term interactions from conversational events, and this flexible and unified network architecture facilitates comprehensive modeling of various important features of conversational sequences. The proposed model is validated on two benchmark datasets from e-commerce, and the evaluation metrics Recall@20 and MRR@20, which characterize hit rate and prediction result ranking, are improved by 2.79% and 5.87% respectively on the YOOCHOOSE dataset, are improved by 2.17% and 6.43% respectively on the DIGINETIC dataset, which verifies the validity and rationality of the model.
    Reference | Related Articles | Metrics
    Cross-Domain Recommendation Model Based on Fine-Grained Opinion from Review
    WANG Yu, WU Yun
    Computer Engineering and Applications    2023, 59 (10): 114-122.   DOI: 10.3778/j.issn.1002-8331.2201-0156
    Abstract35)      PDF(pc) (688KB)(29)       Save
    Most of the existing cross-domain recommendation(CDR) methods simply use the rating data and do not have enough information about the review. Review information contains multiple opinions of users. How to make full use of fine-grained opinion in review information to mine its potential value to better solve the problem of cross-domain recommendation cold-start and data sparsity has become the focus and difficulty of current cross-domain recommendation research. Therefore, this paper designs a cross-domain recommendation model based on fine-grained opinion from review(FGOR-CDRM). It is mainly composed of three modules:fine-grained opinion extraction from review, auxiliary review enhancement and cross-domain correlation learning. Firstly, the text convolutional neural network is combined with the gated mechanism to guide the query by setting two global fine-grained opinion matrices to effectively extract the fine-grained opinion of the review information. Secondly, a layer of convolution is added on top of the text convolution, and auxiliary documents are constructed by using the review of similar non-overlapping users, which effectively alleviates the data sparsity while increasing the diversity of training data. Finally, the correlation between the cross-domain fine-grained opinions is learned, and the correlation matrix is constructed by using the static fine-grained opinion and the semantic matching is carried out to achieve the score prediction of the project by cold-start users in the target domain. Experiments are carried out on three domain pairs consisting of three different Amazon datasets (Book, Movies and TV, CDs and Vinyl). The experimental results show that the performance of the FGOR-CDRM model is better than other benchmark models under the three data pairs. Taking the “movie-book” data pair as an example, the MAE of the FGOR-CDRM model is 6.09% higher than that of the ANR model in the baseline model, and 3.58% higher than that of the CDLFM model.
    Reference | Related Articles | Metrics
    Multi-Feature Fusion Based Model for MOOC Recommendation
    SHU Xinfeng, CAO Wangmei, WANG Shuyan
    Computer Engineering and Applications    2023, 59 (10): 123-133.   DOI: 10.3778/j.issn.1002-8331.2201-0094
    Abstract22)      PDF(pc) (798KB)(30)       Save
    In order to make full use of the contextual information of MOOC(massive open online course) and accurately represent the feature of learners and courses, a multi-feature fusion based model for MOOC recommendation(MFF-MOOCREC) is proposed. Text convolutional neural network as well as bidirectional long short-term memory network are introduced respectively to capture the textual and sequential feature from data, and a multi-level attention mechanism is designed to extract key information from learning records, review texts and multiple attributes of course. In order to increase the coverage ratio of recommendation, prefix-projected pattern growth and affinity propagation algorithm are adopted in combination for a relevant clustering analysis on the original courses’ category labels. Probabilistic matrix factorization is used for parameters training, and the predicted ratings are obtained from the dot product between learners’ latent vectors and courses’ latent vectors. Experiments show that, compared with the available methods, MFF-MOOCREC achieves the improvements of average 46.86%, 41.19%, 10.95% and 44.08%, 28.79%, 9.81% on hit ratio, normalized discounted cumulative gain and coverage ratio indicators over Coursera dataset and iCourse dataset respectively, which indicates the proposed model can effectively alleviate the problem of data sparseness as well as improve the performance of recommendation to some extent.
    Reference | Related Articles | Metrics
    Dual-Channel Sentiment Analysis and Application Based on Gated Attention
    WEI Long, HU Jianpeng, ZHANG Geng
    Computer Engineering and Applications    2023, 59 (10): 134-141.   DOI: 10.3778/j.issn.1002-8331.2112-0585
    Abstract39)      PDF(pc) (590KB)(39)       Save
    Traditional deep learning-based text sentiment classification models usually cannot extract features completely and cannot distinguish polysemous words. To resolve these problems, a dual-channel sentiment classification model named BGA-DNet based on gated attention is proposed. The model uses the BERT pre-training model to process text data, and then extracts text features through a dual-channel network. The channel one uses TextCNN to extract local features, and the channel two uses BiLSTM-Attention to extract global features. At the same time, a gated attention unit is introduced to filter out some useless attention information. With a residual network, it also ensures that the output of the dual-channel retains the original coding information when the network reaches a saturated state. BGA-DNet is evaluated on two public datasets of hotel reviews and restaurant reviews, and compared with the latest sentiment classification methods, it achieves the best results with accuracy rate of 94.09% and 91.82%, respectively. At last, the BGA-DNet model is applied to the real dataset of students’ experiment reports, and the accuracy and [F1] value are also the highest.
    Reference | Related Articles | Metrics
    Knowledge Graph Multi-Target Cross-Domain Recommendation on Digital Cultural Resources
    TONG Xiaokai, ZHU Xinjuan, WANG Xihan, HU Zhulin
    Computer Engineering and Applications    2023, 59 (10): 142-150.   DOI: 10.3778/j.issn.1002-8331.2201-0129
    Abstract34)      PDF(pc) (843KB)(19)       Save
    Digital cultural resources are rich and diverse. Considering the diversity and heterogeneity of resource types, the recommendation of digital cultural resources can be divided into several different subdomains. However, most of the current recommendation methods only aim at single domain, which are unable to capture the propagation of user preferences among multiple domains and make effective use of the information provided by other domains. Therefore, a know-
    ledge graph multi-target cross-domain recommendation model(KGMT) is proposed. Firstly, the relationship between different domains is constructed through knowledge graph, and the global domain embedding of users and items is generated. Then, a fusion attention module based on self-attention mechanism is adopted to combine the embedding representation of target domain and global domain. The whole information is effectively used to improve each target domain. Finally, several experiments are carried out on the real-world datasets of Douban and national culture cloud platform. The experimental results show that the performance of KGMT is better than the baselines, and the evaluating indicators of target domains are improved.
    Reference | Related Articles | Metrics
    Multi-Temporal Scales Consensus for Weakly Supervised Temporal Action Localization
    GUO Wenbin, YANG Xingming, JIANG Zheyuan, WU Kewei, XIE Zhao
    Computer Engineering and Applications    2023, 59 (10): 151-161.   DOI: 10.3778/j.issn.1002-8331.2201-0233
    Abstract30)      PDF(pc) (625KB)(21)       Save
    Weakly supervised temporal action localization model identifies the most distinctive video segments in the action instances, and also mistakes the background segment related to the video-level labels as an action, it is difficult to get a complete action proposal because of using the video-level label as the supervision signal. In order to further detect action segments, a multi-temporal scales consensus for weakly supervised temporal action localization method is proposed by analyzing the consistency of action segments on multi-temporal scales. Firstly, the features of RGB and optical flow are extracted from the input video frames, and a multi-temporal scale module is designed to model the video temporal relationship using convolution kernels of different sizes. Secondly, the predicted action labels with multi-temporal scales consensus are obtained by estimating the multi-time scale feature time class activation map and fusing the multi-branch time class activation map. Finally, in order to further optimize the action labels predicted by the model, the iterative optimization strategy is adopted to update the prediction labels in each iteration, and provide effective frame-level supervision signals for model training. Experiments are conducted on THUMOS14 and ActivityNet1.3 datasets. Experimental results show that the proposed network is superior to the state-of-the-art methods.
    Reference | Related Articles | Metrics
    Multi-Stream Threshold Shrinkage and Fusion Network for Product Surface Defect Detection
    GENG Yubiao, YUE Zhiyuan, YAN Qiming, SUN Yubao
    Computer Engineering and Applications    2023, 59 (10): 162-170.   DOI: 10.3778/j.issn.1002-8331.2201-0405
    Abstract28)      PDF(pc) (1078KB)(18)       Save
    The task of product surface defect detection focuses on the automatic detection and segmentation of abnormal defect areas. In practice, the detection of product surface defects remains a challenging task due to the degrading effects of noise and the complexity and variety of defect types. To cope with these problems, this paper proposes a multi-stream threshold shrinkage and fusion network for product surface defect detection. In each stream of different scales, in order to cope with noise corruption, the proposed network configures an adaptive threshold shrinkage denoising module. This module can autonomously learn the horizontal and vertical shrinkage thresholds in the dual branches, and remove the interference noise from the features while retaining the effective background information, therefore realizing adaptive denoising. In order to locate the defective object more accurately, a contextual 3D attention fusion module is designed to generate 3D attention maps by horizontal and vertical aggregation to enhance the abnormal region features. Finally, parallel multi-scale features are fused to achieve effective detection of different scales and different types of defects. This paper compares the constructed model on SD-900 and MVTec-AD datasets with the latest eight methods. The experimental results show that the model in this paper can effectively improve the detection accuracy and maintain robustness to noise interference, and the ablation experiments also verify the effectiveness of the adaptive threshold shrinkage denoising module and the contextual 3D attention fusion module.
    Reference | Related Articles | Metrics
    Cross-Modal Modulating for Multimodal Sentiment Analysis
    CHENG Zichen, LI Yan, GE Jiangwei, JIU Mengfei, ZHANG Jingwei
    Computer Engineering and Applications    2023, 59 (10): 171-179.   DOI: 10.3778/j.issn.1002-8331.2201-0406
    Abstract21)      PDF(pc) (612KB)(18)       Save
    How to effectively represent modalities and efficiently integrate information between modalities has always been a hot issue in the field of multimodal sentiment analysis(MSA). Most of the existing research is based on the Transformer, and the self-attention module is improved to achieve the effect of cross-modal fusion. However, the fusion method based on the Transformer often ignores the importance of different modalities, and the Transformer cannot effectively capture the temporal features. In response to the above problems, a cross-modal modulating and multimodal gating module network is proposed, which uses the LSTM and the BERT as the representation sub-networks of visual, acoustic and text modalities respectively. The improved Transformer cross-modal modulation module is used to effectively fuse different modal information. A modal gating network is designed to simulate the synthetic judgment process of information from different modes. Comparative experiments are carried out using MOSI and MOSEI datasets, and the results show that the proposed method can effectively improve the accuracy of sentiment classification.
    Reference | Related Articles | Metrics
    Enhanced Recommendation System Based on Attenuation Propagation for Knowledge Graphs
    CAO Yukun, FANG Yixin, MIAO Zeyu, LI Yunfeng
    Computer Engineering and Applications    2023, 59 (10): 180-186.   DOI: 10.3778/j.issn.1002-8331.2201-0303
    Abstract28)      PDF(pc) (698KB)(16)       Save
    At present, the recommendation method based on knowledge graphs does not make enough use of the information from paths of user interest propagation on knowledge graphs, which could not take into account the path propagation among the different levels. To address these problems, an enhanced recommendation system based on attenuation propagation for knowledge graphs(RSAP) is proposed. RSAP advances an inter-layer and an intra-layer interest propagation method, in terms of user interest graphs on knowledge graphs, to extract user interest embedding along different directions by using attenuation factors which reflect the change of user interest. And RSAP uses a purification network with residual blocks that can capture the focus of interest embedding to gain final user embedding for prediction. Experimental results on real-world datasets show that the RSAP outperforms state-of-the-art methods.
    Reference | Related Articles | Metrics
    Multimodal False News Detection Based on Fusion Attention Mechanism
    LIU Hualing, CHEN Shanghui, QIAO Liang, LIU Yaxin
    Computer Engineering and Applications    2023, 59 (9): 95-103.   DOI: 10.3778/j.issn.1002-8331.2202-0204
    Abstract78)      PDF(pc) (622KB)(73)       Save
    Exploring efficient modal representation and multimodal information interaction methods has always been a hot topic in the field of multimodal information detection, for which a new fake news detection technology(MAM) is proposed. The MAM method uses a self-attention mechanism combined with position coding and a pre-trained convolutional neural network to extract text and image features respectively. The introduction of a mixed-attention mechanism module for text and image feature interaction, which uses hierarchical feature processing methods to reduce redundant information generated during multimodal interactions. A two-way feature fusion method is used to ensure the integrity of the training information. The multimodal features are weighted and fed into the fully connected network for true and false news classification. The comparative experimental results show that compared with the existing multimodal reference model, the method is almost improved by about 3 percentage points on each classification index, and the visualization experiment finds that the multimodal features obtained by the mixed attention mechanism have stronger generalization ability.
    Reference | Related Articles | Metrics
    Research on Text Classification by Fusing Multi-Granularity Information
    XIN Miaomiao, MA Li, HU Bofa
    Computer Engineering and Applications    2023, 59 (9): 104-111.   DOI: 10.3778/j.issn.1002-8331.2207-0440
    Abstract57)      PDF(pc) (528KB)(40)       Save
    Current research on Chinese text classification focuses on a single pattern of classifying data information at character granularity, word granularity, sentence granularity and chapter granularity, which often lacks the information features contained in the semantics at different granularities. In order to extract the core content of the text more effectively, a text classification model based on attention mechanism fusing multi-granularity information is proposed. The model constructs embedding vectors for character, word and sentence granularity, where the Word2Vec training model is used for character and word granularity to convert the data into character and word vectors, and the contextual semantic features of the character and word granularity vectors are obtained through a bidirectional long and short-term memory network, and the features contained in the sentence vectors are extracted using the FastText model, and the different feature vectors are fed into the attention mechanism layer to obtain further important semantic information about the text. The experimental results show that the classification accuracy of the model on the three publicly available Chinese datasets is improved over both single granularity and a combination of two or two granularities.
    Reference | Related Articles | Metrics
    Neighbor Relation-Aware Graph Convolutional Network for Recommendation
    SUN Aijing, WANG Guoqing
    Computer Engineering and Applications    2023, 59 (9): 112-122.   DOI: 10.3778/j.issn.1002-8331.2112-0438
    Abstract43)      PDF(pc) (736KB)(29)       Save
    Existing recommender systems based on graph neural networks mainly aggregate the information of neighbors indiscriminately when updating the representation of the target node. In this way, most useful prior knowledge is not introduced in combination with the recommendation system itself to distinguish the relationship between users and items. To solve this problem, a neighbor relation-aware graph convolutional network(NRGCN) is proposed, which combines three prior auxiliary information of rating score, review text, and the timestamp to distinguish the expression differences of different neighbors in the neighborhood explicitly. Specifically, the user’s rating score is introduced as the basis for the closeness of the network, which is then modified by the sentiment score of the review text. Besides, considering the changes in the user’s interest over time, the timestamp is used to encode the neighbor relationship at different times. Extensive experiments on three benchmark datasets show that the proposed model outperforms various state-of-the-art models consistently, with a maximum increase of 12%.
    Reference | Related Articles | Metrics
    Sentence Matching Based on Dependency Syntax and Graph Attention Network
    YANG Chunxia, CHEN Qigang, XU Ben, MA Wenwen
    Computer Engineering and Applications    2023, 59 (9): 123-129.   DOI: 10.3778/j.issn.1002-8331.2112-0391
    Abstract43)      PDF(pc) (669KB)(41)       Save
    Sentence matching is a basic task in natural language processing, which can be applied to natural language inference, paraphrase recognition and other scenarios. At present, most of the mainstream models use attention mechanism to realize the alignment of words or phrases between two sentences. However, these models usually ignore the internal structure of sentences and do not consider the dependency between text units. To solve this problem, this paper proposes a matching model based on dependency syntax and graph attention network. Two methods are designed to model sentence pairs as semantic graph. Graph attention network is used to encode the constructed graph for sentence matching. Experimental results show that the proposed model can learn graph structure well, and the accuracy of the model on the natural language inference dataset SNLI and the paraphrase recognition dataset Quora is 88.7% and 88.9% respectively.
    Reference | Related Articles | Metrics
    Joint Extraction of Entities and Relations Model for Single-Step Span-Labeling
    ZHENG Zhaoqian, HAN Dongchen, ZHAO Hui
    Computer Engineering and Applications    2023, 59 (9): 130-139.   DOI: 10.3778/j.issn.1002-8331.2112-0418
    Abstract24)      PDF(pc) (912KB)(20)       Save
    As an upstream task in many fields such as knowledge graph, relation extraction has a wide range of application value and has received extensive attention in recent years. At present, the problem of exposure bias is common in relation extraction models, and the problems of entity nesting and entity overlapping are common in extracted text, which seriously affect the performance of the model. Therefore, this paper proposes an entity-relationship extraction model(span-labeling based model, SLM) based on Span labeling, which mainly includes:transforming entity-relation extraction problem into span labeling problem; the tokens are combined and arranged and re-tiled into a Span sequence. LSTM and multi-head self-attention mechanism are used to extract deep semantic features of the span. An entity relation label is designed, and a multi-layer labeling method is used for relation label classification. Experiments are carried out on the English datasets NYT and WebNLG. Compared with the baseline model, the F1 value is significantly improved, which verifies the effectiveness of the model, indicating that the model can effectively solve the above problems.
    Reference | Related Articles | Metrics
    Single-Stage Object Detection with Fusion of Point Cloud and Image Feature
    CAI Zhengyi, ZHAO Jieyu, ZHU Feng
    Computer Engineering and Applications    2023, 59 (9): 140-149.   DOI: 10.3778/j.issn.1002-8331.2112-0555
    Abstract39)      PDF(pc) (829KB)(23)       Save
    3D objects can be effectively detected and classified with the supplement of image information to the geometry and texture information of point cloud. Aiming at the problem of effectively integrating image features into point clouds, an end-to-end deep neural network has been designed. A novel fusion module named PI-Fusion(point cloud and image fusion) is proposed to enhance semantic information of point cloud by using image features in a point-by-point manner. In addition, during the downsampling process, a fusion-sampling strategy is adopted to make use of distance farthest point sampling and feature farthest point sampling for small objects. After three downsampling processes of fused image and point cloud features, points are moved to the center of target object through a candidate point generation layer. Finally, the classification confidence and regression are predicted by the network with a single-stage object detection header. Experimental results on KITTI dataset show that the proposed method has improved 3.37, 1.92 and 1.58?percentage points in simple, moderate and hard degree of difficulty on KITTI dataset respectively in comparison with 3DSSD.
    Reference | Related Articles | Metrics
    Spatial-Temporal Convolutional Attention Network for Action Recognition
    LUO Huilan, CHEN Han
    Computer Engineering and Applications    2023, 59 (9): 150-158.   DOI: 10.3778/j.issn.1002-8331.2112-0579
    Abstract47)      PDF(pc) (3452KB)(38)       Save
    In the task of video action recognition, whether in the spatial dimension or temporal dimension of video, how to fully learn and make use of the correlation between features has a great impact on the final recognition performance. Convolution obtains local features by calculating the correlation between feature points in the neighborhood, and self-attentional mechanism learns global information through the information interaction between all feature points. A single convolutional layer does not have the ability to learn feature correlation from the global perspective. Even repeated stacking of multiple layers only obtains several larger receptive fields. Although the self-attention layer has a global perspective, its focus is only the content relationship expressed by different feature points, ignoring the local location characteristics. In order to solve the above problems, a spatial-temporal convolutional attention network is proposed for action recognition. Spatial-temporal convolutional attention network is composed of spatial convolutional attention network and temporal convolutional attention network. Spatial convolutional attention network uses self-attention method to capture the apparent feature relationship of spatial dimension, and uses one-dimensional convolution to extract dynamic information. The temporal convolutional attention network obtains the correlation information between frame level features in the temporal dimension through the self-attention method, and uses 2D convolution to learn spatial features. Spatial-temporal convolutional attention network integrates the common test results of the two networks to improve the performance of model recognition. Experiments are carried out on HMDB51 data set. Taking ResNet50 as the baseline and introducing spatial-temporal convolutional attention module, the recognition accuracy of neural network is improved by 6.25 and 5.13?percentage points in spatial flow and temporal flow respectively. Compared with the current advanced methods, spatial-temporal convolutional attention network has obvious advantages in UCF101 and HMDB51 datasets. The spatial-temporal convolutional attention network proposed in this paper can effectively capture feature correlation information. This method combines the advantages of self-attention global connection and convolution local connection, and improves the spatial-temporal modeling ability of neural network.
    Reference | Related Articles | Metrics
    Weakly Supervised Person Search Combining Dual-Path Network and Multi-Label Classification
    ZHANG Jianhe, JIANG Xiaoyan
    Computer Engineering and Applications    2023, 59 (9): 159-166.   DOI: 10.3778/j.issn.1002-8331.2201-0024
    Abstract29)      PDF(pc) (700KB)(18)       Save
    Supervised person search relies entirely on person bounding boxes and person identity labels. It is easy to annotate person bounding boxes in large-scale datasets, but it’s extremely difficult to collect person identity association information cross multi-camera. In order to get rid of the dependence on person identity label, a weakly supervised person search combining dual-path network and multi-label classification with only person bounding box label method is proposed. In order to reduce the background information interference caused by person detection error, the combination of panoramic image branch and the cutting image branch is used to study the dual-path person instance feature, and to enhance the representation of the semantic information of the person area by minimizing the feature of the same instances in the two paths. At the same time, for the learning of the person re-identification feature, the single class label is assigned to each instance, then prediction multi-label by feature similarity threshold and mutual neighbor methods, and learning feature by multi-label based on the non-parametric classifier. The experimental results show that the mAP and top-1 of CUHK-SYSU dataset are 84.2% and 86.0%, respectively, and the mAP and top-1 of PRW dataset are 38.8% and 85.1%, respectively, showing excellent performance compared with the latest method.
    Reference | Related Articles | Metrics
    Chinese Negative Semantic Representation and Annotation Combined with Hybrid Attention Mechanism and BiLSTM-CRF
    LI Jinrong, LYU Guoying, LI Ru, CHAI Qinghua, WANG Chao
    Computer Engineering and Applications    2023, 59 (9): 167-175.   DOI: 10.3778/j.issn.1002-8331.2201-0088
    Abstract32)      PDF(pc) (697KB)(24)       Save
    Negation is a complex language phenomenon in reading comprehension, which often reverses the polarity of emotion or attitude. Therefore, the correct analysis of negative semantics is of great significance to discourse understanding. There are two problems in the existing negative semantic analysis methods:first, there are few negative words, which can not achieve the purpose of application. Second, at present, Chinese negative semantic tagging is only tagging the whole sentence, which can not clarify the negative semantics. To solve this problem, a negative semantic role annotation method based on Chinese FrameNet is proposed. Firstly, under the guidance of frame semantics theory, combined with the semantic characteristics of Chinese negation, the negative framework inherited by FrameNet is reconstructed. Secondly, in order to solve the problem of capturing long-distance information and syntactic features, a BiLSTM-CRF semantic role annotation model based on Hybrid Attention mechanism is proposed. The Hybrid Attention mechanism layer combines local attention and global attention to accurately represent the negative semantics in the sentence, the BiLSTM network layer automatically learns and extracts the sentence context information, and the CRF layer predicts the optimal negative semantic role label. Through comparison and verification, the model can effectively extract negative semantic information, and the F1 value reaches 89.82% on the negative semantic framework data set.
    Reference | Related Articles | Metrics
    Incomplete Multi-View Clustering Algorithm with Adaptive Graph Fusion
    HUANG Zhanpeng, WU Jiekang, YI Faling
    Computer Engineering and Applications    2023, 59 (9): 176-181.   DOI: 10.3778/j.issn.1002-8331.2201-0192
    Abstract39)      PDF(pc) (552KB)(30)       Save
    Multi-view clustering can make full use of the consistency and difference of samples between different views, which has attracted more and more attention. The traditional multi-view clustering method assumes that the samples of each view are complete, however, the collected multi-views data are usually incomplete in practical applications. In order to perform clustering analysis on incomplete multi-view data, an incomplete multi-view clustering algorithm with adaptive graph fusion(IMC_AGF) is proposed. In IMC_AGF, the shared samples between two views are used as the anchors to construct a sample-sample similarity matrix by learning its consistency knowledge. Then the complementarity between two views are exploited to integrate all the similarity maps with adaptive graph fusion method, and the final similarity matrix of incomplete multi-view data is obtain. Finally, the spectral clustering is used to get the clustering result. The experimental results show that the proposed algorithm is superior to the classical multi-view clustering method.
    Reference | Related Articles | Metrics
    End-to-End Triple Extraction Incorporated Formal Concept of Relation
    CHENG Chunlei, ZOU Jing, YE Qing, ZHANG Suhua, LAN Yong, YANG Rui
    Computer Engineering and Applications    2023, 59 (9): 182-189.   DOI: 10.3778/j.issn.1002-8331.2201-0418
    Abstract32)      PDF(pc) (655KB)(17)       Save
    Triple extraction is the basic work of knowledge learning and knowledge graph construction. Aiming at the problems of entity recognition and relation extraction in current task models such as weak semantic association, entity nesting, relation overlap, and not much attention to existing concept knowledge, combining formal concept and neural network model, an end-to-end triple extraction method based on formal concept of relation is proposed. The model puts forward formal concept label of relation to unify the semantic expression of the entity and the relation, and converts the entity recognition problem into the concept label learning problem. The entity is input into the relational formal concept attention model, and the attention mechanism tries to capture the main relation. The connected connotation feature of the object concept, that is, the comprehensive features of the subject and object corresponding to each relation label and their context-dependent predicates are obtained through. The multiple relation labels of each pair of subject and object are output through multiple relation classifiers to realize the concept-based connectivity multi-relation extraction. In addition, the model can also introduce the extension and connotation of existing formal concept to improve the dependence on corpus tags and the tagging difficulties of model caused by entity nesting. The results on two datasets prove that the proposed model has practical effects on knowledge extraction, and can improve the problems of entity embedding and relation overlap.
    Reference | Related Articles | Metrics
    Aspect-Based Sentiment Analysis with Cross-Heads Attention
    ZHOU Runmin, HU Xuyao, WU Kewei, YU Lei, XIE Zhao, JIANG Long
    Computer Engineering and Applications    2023, 59 (9): 190-197.   DOI: 10.3778/j.issn.1002-8331.2201-0454
    Abstract43)      PDF(pc) (572KB)(30)       Save
    Aspect level affective analysis aims to identify the positive, negative and neutral emotions of aspect words in sentences. The key is to learn the relationship between aspect words and words in sentences. When learning the relationship between words, the existing convolution gated network uses the time convolution method, and its local time window can not describe the relationship between any words. At the same time, the attention of the existing temporal attention model is independent of each other when analyzing the relationship between words. In order to analyze the complex relationship between aspect words and other words in sentences, an emotion analysis model based on cross attention and convolution gated network is proposed in this paper. Firstly, for a given word vector feature, this paper designs a cross attention module. The module adds crossed linear mapping to the matching scores of query vector and keyword vector in multiple attention, so as to integrate the matching scores in multiple attention, which is used to describe the context word relationship of more complex aspect words. Secondly, this paper uses the gated convolution network to encode the local word relationship, and designs the word position coding module to provide the position coding characteristics of words, so as to analyze the effect of position coding on the analysis of word relationship. Finally, for the above encoded word features, this paper uses time pooling to obtain sentence description, and uses full connection classifier to predict emotion classification markers. The experimental analysis on Rest14 and Laptop14 data sets shows that this method can effectively estimate the score relationship between aspect words and other words.
    Reference | Related Articles | Metrics
    Dense Road Vehicle Detection Based on Lightweight ConvLSTM
    JIN Zhi, ZHANG Qian, LI Xiying
    Computer Engineering and Applications    2023, 59 (8): 89-96.   DOI: 10.3778/j.issn.1002-8331.2112-0408
    Abstract101)      PDF(pc) (808KB)(76)       Save
    Aiming at the problems of the false negatives and false positives caused by target occlusion in congested scenes, considering that the overlap degree of the same vehicle in the video is different at different moments, the features provided by the vehicle at the unobstructed time can help the detection of the target vehicle at the current moment. WB-YOLO v5 suitable for dense scenes is proposed. Based on the input data structure of ConvLSTM, feature selection and feature sparsity modules are designed to realize feature recalibration. The features output by the feature selection and feature sparse modules are sent to different branches of ConvLSTM to realize the enhancement and attenuation of features at different times. Then 1×1 convolution is used to replace the original gating structure, and a lightweight WBConvLSTM is constructed to reduce the number of parameters and calculations. It also improves the training speed and detection accuracy of small sample data source targets. Finally, WBConvLSTM is introduced into the Neck of YOLO v5 to enhance the feature extraction ability of the network. Experimental results show that compared with YOLO v5, WB-YOLO V5 has 1.83 percentage points improvement of mAP. Compared with ConvLSTM, WBConvLSTM reduces the number of parameters and calculations about 2/3 and 6/13, respectively.
    Reference | Related Articles | Metrics
    Research on COVID-19 Text Entity Relation Extraction and Dataset Construction Methods
    YANG Chongluo, SHENG Long, WEI Zhongcheng, WANG Wei
    Computer Engineering and Applications    2023, 59 (8): 97-104.   DOI: 10.3778/j.issn.1002-8331.2205-0518
    Abstract83)      PDF(pc) (532KB)(41)       Save
    Entity relationship extraction can effectively obtain key information in the text, and using the key information in the COVID-19 text can help cut off the transmission route of the epidemic and discover the source of the epidemic. However, there is no suitable public annotated dataset in this field. To solve this problem, by analyzing the semantic representation and structural characteristics of the COVID-19 text, an entity relationship definition for the COVID-19 text is proposed, and the collected data is analyzed according to the entity relationship definition. Entity annotation and relationship annotation, after the annotation is completed, through data preprocessing and other operations to generate a COVID-19 text entity relationship extraction dataset. Compared with public datasets, the datasets in this field have denser distribution of entities and relationships, and the feature extraction capability of a single neural network model is poor. Therefore, a method of splicing multiple neural network models is used to construct a named entity recognition model and a relationship extraction model. The data set is experimentally verified by the results of the model, and the experimental results prove that the data set can be applied to the entity relation extraction task in this field.
    Reference | Related Articles | Metrics
    Regularized Extraction of Remotely Sensed Image Buildings Using U-Shaped Networks
    DAI Chao, LIU Ping, SHI Juncai, REN Hongjie
    Computer Engineering and Applications    2023, 59 (8): 105-116.   DOI: 10.3778/j.issn.1002-8331.2203-0564
    Abstract74)      PDF(pc) (1049KB)(21)       Save
    Aiming at the problem that the bilinear interpolation and transposed convolution algorithm of fully convolutional neural network cannot accurately restore the outline of the segmented object in the task of high-resolution remote sensing image building extraction, an improved ResNeXt_SPP_Unet fully convolutional neural network is established based on the Unet network, and it proposes improved Douglas Peucker image post-processing algorithm to complete building extraction regularization. The ResNeXt_SPP_Unet network focuses on two aspects of optimization. One is to replace the standard convolution in Unet with ResNeXt Block, which reduces the number of model operations and improves the segmentation accuracy of the network; the other is to introduce the SPP pyramid pooling layer at the end of the Encoder. The method of scale feature fusion improves the segmentation accuracy of object edges. Through experimental comparison and analysis, the results show that the improved ResNeXt_SPP_Unet is superior to classical segmentation networks such as Unet and cutting-edge segmentation networks such as ResUnet++ in the task of high-scoring remote sensing image building extraction, with an average intersection ratio of 0.853 8 and an average pixel accuracy of 0.935 9. Finally, after connecting the improved Douglas Peucker algorithm to the ResNeXt_SPP_Unet model, the algorithm is improved by adding processing operations such as rotation and connection of the edge of the building outline, further fitting the real outline of the building, and regularizing the boundary of the building. It works well.
    Reference | Related Articles | Metrics
    Face Recognition Method Based on Improved Visual Transformer
    JI Ruirui, XIE Yuhui, LUO Fengkai, MEI Yuan
    Computer Engineering and Applications    2023, 59 (8): 117-126.   DOI: 10.3778/j.issn.1002-8331.2208-0182
    Abstract107)      PDF(pc) (768KB)(86)       Save
    Most face recognition methods rely on convolutional neural networks currently, which construct cascaded multi-layer processing units and fuse local features with convolution operation, result in ignoring the global semantic information and attention to the key areas of the face image. This paper proposes a face recognition method based on improved visual Transformer. Shuffle Transformer is introduced as the backbone network of feature extraction, the global information of feature map is captured through self-attention mechanism and Shuffle operation, and the long-distance dependence relationship is established between feature points to enhance the feature perception ability of the model. At the same time, considering the characteristics of ArcFace loss function and center loss function, the fusion loss is designed as the objective function, which utilizes the intra-class constraints to enlarge the angle interval and increase the discrimination of feature space. The proposed method achieves average accuracy of 99.83%, 95.87%, 90.05%, 98.05% and 97.23% on five challenging benchmark face datasets, LFW, CALFW, CPLFW, AGEDB-30 and CFP. It is proved that the improved model can effectively promote the ability of face feature extraction, and achieve better recognition effect than that of convolutional neural network in the same scale.
    Reference | Related Articles | Metrics
    Bimodal Emotion Recognition Model Based on Cascaded Two Channel Phased Fusion
    XU Zhijing, LIU Xia
    Computer Engineering and Applications    2023, 59 (8): 127-137.   DOI: 10.3778/j.issn.1002-8331.2111-0542
    Abstract75)      PDF(pc) (735KB)(31)       Save
    In order to fully extract the deep emotional features of text and speech and solve the problem of effective interactive fusion between this two modals, a bimodel emotion recognition model based on cascade two channel and phased fusion(CTC-PF) is proposed. First, the cascaded sequential attention-Encoder(CSA-Encoder) is designed to compute the long-distance speech emotion sequence information in parallel and extract the deep-level speech emotion feature. Besides, the affective field cascade-Encoder(AFC-Encoder) is designed to improve the text feature extractor’s global and local text understanding abilities and solve the problem of sparse key emotional features of text. After this two cascaded channels model completing the feature extraction of speech and text information, the collaborative attention mechanism is used to interactively integrate the important emotional features of this two modals, which aim to reduce the cost of alignment operations, and then the Hadamard dot product is designed to perform secondary fusion to capture the difference features and solve the problem of insufficiency of emotional information interaction between this two modals, phased fusion realizes the information interaction between modal sequences of different time steps. The emotion recognition model performs classification experiments on the IEMOCAP dataset. The results show that the accuracy of emotion recognition can reach 79.4%, and the F1-score can reach 79.0%. Compared with the existing mainstream methods, the performance of the proposed model is significantly improved, which proves the proposed fusion model is in a high superiority of speech and text bimodal emotion recognition.
    Reference | Related Articles | Metrics