Content of Pattern Recognition and Artificial Intelligence in our journal

        Published in last 1 year |  In last 2 years |  In last 3 years |  All
    Please wait a minute...
    For Selected: Toggle Thumbnails
    Improved Behavioral Cloning and DDPG’s Driverless Decision Model
    LI Weidong, HUANG Zhenzhu, HE Jingwu, MA Caoyuan, GE Cheng
    Computer Engineering and Applications    2024, 60 (14): 86-95.   DOI: 10.3778/j.issn.1002-8331.2304-0158
    Abstract28)      PDF(pc) (5304KB)(37)       Save
    The key to driverless technology is that the decision-making level makes accurate instructions based on the input information of the perception link. Reinforcement learning and imitation learning are better suited for complex scenarios than traditional rules. However, the imitation learning represented by behavioral cloning has the problem of composite error, and this paper uses the priority empirical playback algorithm to improve the behavioral cloning to improve the fitting ability of the model to the demo dataset. The original DDPG (deep deterministic policy gradient) algorithm has the problem of low exploration efficiency, and the experience pool separation and random network distillation (RND) technology are used to improve the DDPG algorithm and improve the training efficiency of DDPG algorithm. The improved algorithm is used for joint training to reduce the useless exploration in the early stage of DDPG training. Verified by TORC (the open racing car simulator) simulation platform, the experimental results show that the proposed method can explore more stable road maintenance, speed maintenance and obstacle avoidance ability within the same number of training times.
    Reference | Related Articles | Metrics
    Lightweight Face Recognition Algorithm Combining Transformer and CNN
    LI Ming, DANG Qingxia
    Computer Engineering and Applications    2024, 60 (14): 96-104.   DOI: 10.3778/j.issn.1002-8331.2311-0276
    Abstract69)      PDF(pc) (3685KB)(126)       Save
    With the development of deep learning, convolutional neural networks have become the mainstream approach for face recognition (FR) by gradually expanding the receptive field through stacking convolutional layers to integrate local features. However, this approach suffers from the drawbacks of neglecting global semantic information of faces and lacking attention to important facial features, resulting in low recognition accuracy. Additionally, the stacking of a large number of parameters and layers poses challenges for deploying the network on resource-constrained devices. Therefore, a highly lightweight face recognition algorithm called gcsamTfaceNet is proposed, which combines Transformer and CNN. Firstly, a depthwise separable convolution is used to construct the backbone network in order to reduce the parameter count of the algorithm. Secondly, a channel-spatial attention mechanism is introduced to optimize the selection of features in both the channel and spatial domains, thereby improving the attention given to important facial regions. Building upon this, the Transformer module is integrated to capture the global semantic information of the feature maps, overcoming the limitations of convolutional neural networks in modeling long-range semantic dependencies and enhancing the algorithm’s ability to perceive global features. The gcsamTfaceNet, with a parameter count of only 6.5×105, is evaluated on nine validation datasets including LFW, CA-LFW, CP-LFW, CFP-FP, CFP-FF, AgeDB-30, VGG2-FP, IJB-B, and IJB-C. It achieves average accuracies of 99.67%, 95.60%, 89.32%, 93.67%, 99.65%, 96.35%, 93.36%, 89.43%, and 91.38% on these datasets, respectively. This demonstrates a good balance between parameter count and performance.
    Reference | Related Articles | Metrics
    Classification Network Integrated with Multidimensional Attention Strategy of Underground Disaster Images
    ZHANG Gong, XU Ming, LI Kaipeng, WANG Bin
    Computer Engineering and Applications    2024, 60 (14): 105-113.   DOI: 10.3778/j.issn.1002-8331.2304-0175
    Abstract19)      PDF(pc) (4578KB)(27)       Save
    Ground penetrating radar (GPR) is a non-destructive exploration technology that utilizes high-frequency ultra-wideband signals to detect the distribution of subsurface objects and media. Benefiting from the advantages of non-destructiveness, high efficiency, and high resolution, GPR has been widely applied in underground defect detection task of urban roads. Illustrating the radar echo wave of subsurface structure, GPR B-scan images are a main means to detect underground disaster; however, compared to natural images, automatic interpretation of GPR B-scan ones is a more challenging task because of same objects with different spectra, different objects with same spectrum as well as heavy noise pollution. Aiming to improve the accuracy of subsurface disasters detection methods, a disasters classification network, i.e., MA-ResNeXt, based on ResNetXt50 is proposed by combining multi-dimensional attention mechanism, atrous space pyramid pool and multi-scale feature extraction structure. The proposed network is trained and tested on real GPR B-scan images of three common subsurface disasters, e.g., void, cavity underneath pavement (CUP) and loosely infilled void (LIV). The comparison results show that classification accuracy of the proposed network approaches 98.2%, and illustrate that the network can effectively realize accurate classification of underground disasters.
    Reference | Related Articles | Metrics
    Motion Planning for Autonomous Driving in Dense Traffic Scenarios
    XIAO Yuwei, YAO Xizi, HU Xuemin, LUO Xianzhi
    Computer Engineering and Applications    2024, 60 (14): 114-122.   DOI: 10.3778/j.issn.1002-8331.2304-0208
    Abstract20)      PDF(pc) (7139KB)(25)       Save
    Aiming at the problem that the existing motion planning methods for autonomous driving ignore the interaction of surrounding vehicles when extracting state information and the bad planning effect in dense traffic scenarios, a motion planning model combined with graph neural network and deep reinforcement learning is proposed. Firstly, based on the graph neural network, an interactive feature representation method of self-driving vehicles is proposed to extract spatial interaction features of multiple traffic participants. In this case, a learning strategy for motion planning is designed based on twin delayed deep deterministic policy gradient (TD3), and the next action is predicted from the interactive features so as to realize motion planning. The proposed method is compared with the current motion planning model LSTM+TD3,TD3 and deep deterministic policy gradient (DDPG) for autonomous driving, in dense traffic scenarios, the experimental results of training and testing in the PGDrive driving simulator increased by 36%, 43%, 23% and 13, 19, 53?percentage points compared with the comparison method, which means the proposed method can effectively solve the problem of interactive information perception of surrounding vehicles for better motion planning of autonomous driving.
    Reference | Related Articles | Metrics
    Fin-BERT-Based Event Extraction Method for Chinese Financial Domain
    LI Yi, GENG Chaoyang, YANG Dan
    Computer Engineering and Applications    2024, 60 (14): 123-132.   DOI: 10.3778/j.issn.1002-8331.2304-0224
    Abstract26)      PDF(pc) (4046KB)(29)       Save
    Event extraction aims to extract human-interest information from massive amounts of unstructured text. Currently, most existing event extraction methods are based on general corpora and rarely consider domain-specific prior knowledge. Moreover, most methods cannot handle well the case where multiple events exist in the same document, and they perform poorly when faced with a large number of negative examples. To address these issues, this paper proposes a model called Fin-PTPCG based on Fin-BERT (financial bidirectional encoder representation from Transformers) and PTPCG (pseudo-trigger-aware pruned complete graph). This method fully utilizes the expression ability of the Fin-BERT pre-training model and incorporates domain-specific prior knowledge during the encoding stage. In the event detection module, multiple binary classifiers are stacked to ensure that the model can effectively identify the situation of multiple events in a document and screen out negative examples. Combined with the decoding module of the PTPCG model, entities are extracted and connected into a complete graph and pruned by calculating a similarity matrix. The problem of unlabeled triggers is solved by selecting pseudo-triggers. Finally, the event extraction is achieved by the event classifier. This method achieves a 0.7 and 3.7?percentage points improvement in F1 score compared to the baselines on the ChFinAnn and Duee-fin datasets for the event extraction task.
    Reference | Related Articles | Metrics
    Graph Convolutional Neural Networks Optimized by Momentum Cosine Similarity Gradient
    YAN Jianhong, DUAN Yunhui
    Computer Engineering and Applications    2024, 60 (14): 133-143.   DOI: 10.3778/j.issn.1002-8331.2304-0250
    Abstract27)      PDF(pc) (4731KB)(33)       Save
    The traditional gradient descent algorithm only uses the exponential weighted accumulation of historical gradients and does not take advantage of the local changes of gradients, which causes the optimization process to cross the global optimal solution. Even if it converges to the optimal solution, it will oscillate near the optimal solution. Using it to train graph convolutional neural network will result in slow convergence speed and low test accuracy. In this paper, the cosine similarity is used to dynamically adjust the learning rate and propose the cosine similarity gradient descent (SimGrad) algorithm. In order to further improve the convergence speed and test accuracy of the graph convolutional neural network training and reduce the oscillation, the momentum cosine similarity gradient descent (NSimGrad) algorithm is proposed combined with the momentum idea. The convergence analysis proves the regret bound of SimGrad algorithm and NSimGrad algorithm which is [OT]. Test on the three constructed non-convex functions and experiment on four datasets combined with the graph convolutional neural network. Experimental results show that SimGrad algorithm ensures the convergence of graph convolutional neural network, and NSimGrad algorithm further improves the convergence speed and test accuracy of graph convolutional neural network training. SimGrad and NSimGrad algorithms have better global convergence and optimization ability than Adam and Nadam.
    Reference | Related Articles | Metrics
    Multi-Information Enhanced Graph Convolutional Network For Aspect Sentiment Analysis
    YANG Chunxia, YAN Han, WU Yalei, HUANG Yukun
    Computer Engineering and Applications    2024, 60 (14): 144-151.   DOI: 10.3778/j.issn.1002-8331.2305-0376
    Abstract25)      PDF(pc) (3073KB)(33)       Save
    Aspect level sentiment analysis aims to predict the emotional polarity of specific aspects of a sentence.However, there is still the problem of insufficient use of semantic information in the current stage of research, on the one hand, most of the existing work focuses on learning the dependency information between contextual words and aspect words, and does not make full use of the semantic information of sentences; on the other hand, the existing research does not focus on the syntax construction of dependency trees, so it does not make full use of the grammatical structure information to supplement the semantic information. In view of the above problems, this paper proposes a multi-information augmented graph convolutional neural network (MIE-GCN) model. It mainly includes two parts: one is to form a multi-information fusion layer through aspect perception attention, self-attention and external common sense to make full use of semantic information; the second is to construct a grammatical mask matrix of sentences according to the different grammatical distances between words, and supplement semantic information by obtaining comprehensive grammatical structure information. Finally, the graph convolutional neural network is used to enhance the node representation. The experimental results on the benchmark dataset show that the proposed model has a certain improvement over the comparison model.
    Reference | Related Articles | Metrics
    Chinese Medical Q&A Matching Model Based on Multi-Granularity Semantic Information and Knowledge Graph
    GUAN Liben, LI Shi
    Computer Engineering and Applications    2024, 60 (14): 152-161.   DOI: 10.3778/j.issn.1002-8331.2305-0453
    Abstract19)      PDF(pc) (3781KB)(32)       Save
    Chinese medical Q&A is easily affected by the noise of medical-specific terminology, making it more challenging than open-domain Q&A. Previous studies on Chinese medical Q&A mainly relied on character-level fine-grained information, neglecting word-level coarse-grained information that carries more semantic information. In addition, introducing external medical knowledge graph can further enrich the fine-grained information in Q&A sentences, but most existing studies usually adopt a simple way of joint representation of sentences and external knowledge. Therefore, this paper proposes a Chinese medical Q&A matching model based on multi-granularity semantic information and knowledge graph (CMQA-MGSI). The model employs a Lattice network to select the most relevant character-level and word-level sequences from the Q&A sentences, and leverages Word2Vec and BERT to enhance the semantic information; to better exploit the external domain knowledge, a dual-channel attention mechanism is devised to capture the multi-angle knowledge representations between the Q&A sentences and the entity embeddings and relation embeddings in the knowledge graph. Experi-ments on the cMedQA1.0 and cMedQA2.0 datasets demonstrate that the proposed model outperforms existing Chinese medical Q&A matching models.
    Reference | Related Articles | Metrics
    Multimodal Feature Adaptive Fusion for Fake News Detection
    WANG Teng, ZHANG Dawei, WANG Liqin, DONG Yongfeng
    Computer Engineering and Applications    2024, 60 (13): 102-112.   DOI: 10.3778/j.issn.1002-8331.2303-0316
    Abstract42)      PDF(pc) (5124KB)(92)       Save
    In order to solve the problem that it is difficult to make full use of graphic and text information in multimodal news detection in social media news and to explore efficient multimodal information interaction methods, an adaptive multimodal feature fusion model for fake news detection is proposed. First, the model extracts and represents news text semantic features, text emotional features, and image-text semantic difference features; then, weighted splicing and fusion of various features are performed by adding adaptive weight parameters to reduce the redundancy introduced by model splicing; finally, the fusion feature is sent to the classifier. Experimental results show that the proposed model outperforms the current state-of-the-art models in evaluation indicators such as F1 score. It effectively improves the performance of fake news detection and provides strong support for the detection of fake news in social media.
    Reference | Related Articles | Metrics
    Research on 3D Object Detection Method Based on Multi-Modal Fusion
    TIAN Feng, ZONG Neili, LIU Fang, LU Yuanyuan, LIU Chao, JIANG Wenwen, ZHAO Ling, HAN Yuxiang
    Computer Engineering and Applications    2024, 60 (13): 113-123.   DOI: 10.3778/j.issn.1002-8331.2309-0217
    Abstract37)      PDF(pc) (4852KB)(60)       Save
    Aiming at the problem that the detection algorithm based on pure point cloud is prone to miss detection and false detection of far-small targets due to the sparsity and disorder of point cloud, a multi-modal 3D object detection algorithm combining image features and point cloud voxel features is proposed. In the stage of image feature extraction, a lightweight deep residual network is proposed to reduce the number of image feature channels and make it consistent with the point cloud voxel features, so as to improve the fusion ability of point cloud and image features. In the fusion stage of voxel features and image features, a double feature fusion network is proposed. On the basis of retaining the original voxel feature structure information, the image features and voxel features are fused to make the point cloud have rich semantic information, so as to improve the detection accuracy of far-small targets. The experimental results on the KITTI dataset show that compared with the baseline model, the 3D average detection accuracy of car, cyclist and pedestrian is improved by 0.76 percentage points, 2.30 percentage points and 3.43 percentage points, respectively. The experimental results verify the effectiveness of the proposed method for solving the problem of false detection and missed detection of far-small targets.
    Reference | Related Articles | Metrics
    Cross-Modal Transformer Combination Model for Sentiment Analysis
    WANG Liang, WANG Yi, WANG Jun
    Computer Engineering and Applications    2024, 60 (13): 124-135.   DOI: 10.3778/j.issn.1002-8331.2302-0238
    Abstract57)      PDF(pc) (4851KB)(96)       Save
    The Transformer-based end-to-end combination deep learning model is the mainstream model of multimodal sentiment analysis. In view of the lack of sentiment feature extraction ability of low-resource modal data, the difference of feature scales of non-aligned data in different modals, which lead to the loss of key feature information in the alignment and fusion process, and the unreliable multimodal long-term dependency mechanism caused by the parallel processing of multimodal data by the traditional attention model, this paper proposes an sentiment analysis model LAACMT based on lightweight attention aggregation module and cross-modal Transformer, which can use multimodal non-aligned data to perform binary classification and multiclass classification tasks. The model proposes to extract low-resource modal information using gated recurrent unit (GRU) and improved feature extraction algorithm, proposes positional encoding and convolution scaling methods for aligning multimodal contexts, proposes a multimodal multi-head attention mechanism to fuse aligned multimodal data and establishes a reliable cross-modal long-term dependency mechanism. The experimental results of the model on CMU-MOSI, which contains three modals of non-aligned dataset including text, voice and video, show that the performance evaluation index of the model has been steadily improved compared with SOTA, in which Acc7 has been improved by 3.96%, Acc2 has been improved by 4.08%, and F1 score has been improved by 3.35%. The results of ablation study show that the model proposed in this paper solves the problems in multimodal sentiment analysis, reduces the complexity of the multimodal sentiment analysis model based on Transformer, improves the performance of the model, and avoids over-fitting problems.
    Reference | Related Articles | Metrics
    Incorporating Relation Path and Entity Neighborhood Information for Knowledge Graph Completion Method
    ZHAI Sheping, KANG Xinnian, LI Fangyi, YANG Rui
    Computer Engineering and Applications    2024, 60 (13): 136-142.   DOI: 10.3778/j.issn.1002-8331.2303-0369
    Abstract34)      PDF(pc) (2777KB)(61)       Save
    Knowledge graph provides the underlying technical support for many AI applications, including e-commerce, smart navigation, healthcare, social media, and more. However, the existing knowledge graph is usually sparse, and a large amount of hidden knowledge has not been mined, so how to complete the knowledge map has become an urgent problem to be solved. Most of the existing methods process entity neighborhood information or relationship paths independently, ignoring the importance of entity neighborhood information to the relationship path exploration process. Therefore, a knowledge graph completion method (RPEN-KGC) is proposed to fuse relational path and entity neighborhood information. RPEN-KGC consists of a sampler and an inference. The sampler provides an expert path for the inferent by randomly walking between pairs of entities, and at the same time restricts the direction of random walk with the entity neighborhood similarity comparison mechanism to enrich the expert path. By extracting the semantic features in the relationship path, the inferent can infer more diverse relationship paths in the semantic space. Experimental verification is carried out on the publicly available NELL-995 and FB15K-237 datasets by link prediction task. The experimental results show that RPEN-KGC is improved compared with the baseline method in most indicators, indicating that RPEN-KGC can effectively predict the missing knowledge in the knowledge graph.
    Reference | Related Articles | Metrics
    Dependency Feature Learning Method for Table Filling for Relation Extraction
    TANG Yuan, CHEN Yanping, HU Ying, HUANG Ruizhang, QIN Yongbin
    Computer Engineering and Applications    2024, 60 (13): 143-151.   DOI: 10.3778/j.issn.1002-8331.2303-0380
    Abstract29)      PDF(pc) (3829KB)(55)       Save
    Table-filling-based relation extraction methods use deep neural networks to map sentences to two-dimensional abstract representations, ignoring the semantic structure between different spans in sentences, and it is difficult to obtain long-distance semantic dependencies in sentences. Aiming at this shortcoming of the table filling method, this paper proposes a table-filling relation extraction model combined with syntactic dependency tree. First, the model maps sentences to 2D abstract representations via biaffine. And then, the semantic dependency adjacency matrix is initialized through using the syntactic dependency tree of the sentence, whose features between words in the two-dimensional representation can be learned using the adjacency matrix. Finally, the 2D representation of the sentence is updated using gated recurrent unit extraction features to capture the semantic dependencies between spans and the structural features of the sentence in the sentence 2D abstract representation. Experimental results show that the proposed model can acquire long-distance semantic dependency features in sentences effectively, and improve the performance of relation extraction by learning span semantic dependency information and sentence grammatical structure features.
    Reference | Related Articles | Metrics
    Incorporating Syntax Enhancement and Noise Reduction for Aspect-Based Sentiment Analysis Model
    WANG Hongsong, LI Jiazhan, YE Haoxian, TAO Ran
    Computer Engineering and Applications    2024, 60 (13): 152-161.   DOI: 10.3778/j.issn.1002-8331.2303-0452
    Abstract33)      PDF(pc) (4257KB)(46)       Save
    Aspect-based sentiment analysis (ABSA) aims to determine the sentiment polarity of a given aspect word in a sentence. Recent research has mainly used dependency syntax information to implicitly associate the sentiment interaction between aspect words and target words. However, combining dependency syntax information lacks the recognition of local context information centered on aspect words. In addition, modeling complex syntax information equivalently introduces noise that can harm model performance. A neural network model that combines syntax enhancement and noise reduction is proposed to address the issues present in previous research to address the issues present in previous research. This neural network model, a neural network model that combines syntax enhancement and noise reduction, is proposed. This method integrates component information based on dependency syntax information, allowing the model to focus on global dependencies between words and not only focus on global dependencies between words but also on local dependencies centered on aspect words. Furthermore, to reduce noise interference from syntactic information, the model weakens noise interference based on the distance information of the dependency syntax tree. Finally, the model is tested on four benchmark datasets and outperforms baseline models on all datasets.
    Reference | Related Articles | Metrics
    Poetry Generative Model Incorporating Prosodic Features
    WU Lindong, HE Xiangzhen, WAN Fucheng
    Computer Engineering and Applications    2024, 60 (13): 162-170.   DOI: 10.3778/j.issn.1002-8331.2303-0477
    Abstract17)      PDF(pc) (3570KB)(15)       Save
    Prosody specification and topic consistency in poetry generation have always been a research hotspot in the field of natural language generation. In order to improve the prosody specification in poetry generation, this paper proposes a poem generation model based on Transformer combined with prosody (Transformer and prosodic features poetry generation model, TPPG). According to the prosodic features, the prosodic thesaurus and the prosodic thesaurus are established, and the prosodic coding is introduced into the Transformer encoder. During the model training process, more prosodic feature information can be captured, and a variety of poetic rhythms can be learned. Finally, the rhyme is generated according to the established rhyme lexicon, and the optimal verse with the rhythm feature specification is selected for the candidate poem by using a large posterior probability, which improves the standardization and fluency of the poem as a whole. The experimental results show that the poetry generated by TPPG model can conform to rhythm well, and it has improved in both manual evaluation and machine evaluation.
    Reference | Related Articles | Metrics
    Point Cloud Classification Segmentation Combining Inter-Region Structure Relations and Self-Attention Edge Convolution Network
    LYU Zhiwei, YANG Jiazhi, ZHOU Guoqing, SHEN Lu
    Computer Engineering and Applications    2024, 60 (13): 171-179.   DOI: 10.3778/j.issn.1002-8331.2304-0141
    Abstract20)      PDF(pc) (3715KB)(39)       Save
    A new network framework, ISEC-Net (inter-region structure relations and self-attention edge convolution network), is proposed to address the problem of insufficient capture of context and relational features within a region in point cloud networks for deep learning. The network consists of two modules:IrConv (inter-region convolution) and SaConv (self-attention convolution). The SaConv module can extract finer edge features, while the IrConv can dynamically integrate local structural information into point features and adaptively capture inter-regional relationships. Extensive experiments are conducted on the ModelNet40 and ShapeNet datasets for point cloud classification and part segmentation. The results show that on the ModelNet40 dataset, the overall accuracy (OA) of the ISEC-Net model reaches 93.5%, and the average accuracy (mAcc) reaches 90.7%. On the ShapeNet dataset, the average intersection-over-union (mIoU) reaches 86.1%, and the part segmentation accuracy of guitar, headphone, cup and other parts in the single-class intersection-over-union (IoU) experiment is excellent. This demonstrates that compared with traditional dynamic graph convolutional networks, ISEC-Net can accurately capture the local features and fine structure of point clouds and enhance the aggregation of global features, thus having excellent effectiveness and generalization ability.
    Reference | Related Articles | Metrics
    Local and Global View Occlusion Facial Expression Recognition Method
    NAN Yahui, HUA Qingyi
    Computer Engineering and Applications    2024, 60 (13): 180-189.   DOI: 10.3778/j.issn.1002-8331.2309-0213
    Abstract28)      PDF(pc) (5279KB)(44)       Save
    Various occlusions in the actual scene increase the difficulty of expression recognition. This paper proposes a method consisting of a local weighted convolutional attention slider and a global attention pooling vision Transformer to address the occlusion problem. It extracts facial feature maps using a backbone convolutional neural network, crops the facial feature map into multiple regions, and uses a local Patch attention unit to perceive occluded regions by adaptively calculating the attention weights of local features, extracting local facial expression features. The facial feature map is converted into Patch blocks, and the vision Transformer with Patch-level attention pooling and Token-level attention pooling is used to capture the interactions and correlations between Patch blocks from a global perspective. The guidance model emphasizes the most distinctive features while ignoring occlusion to reduce the impact of irrelevant features. Experiments on three expression datasets, their occlusion subsets, and an occlusion dataset show that the proposed model outperforms existing methods in occlusion expression recognition.
    Reference | Related Articles | Metrics
    News Recommendations Based on User Implicit Feedback Signals and Multi-Dimensional Interests
    WU Jinlu, CUI Xiaohui
    Computer Engineering and Applications    2024, 60 (12): 101-110.   DOI: 10.3778/j.issn.1002-8331.2303-0202
    Abstract56)      PDF(pc) (3753KB)(87)       Save
    User preference modeling is a key factor in improving the quality of personalized news recommendations. Existing researches usually model the task as click-through rate estimation tasks, starting from the user’s explicit feedback signal to construct an interest representation. However, due to the lack of explicit feedback signals and the variety of user interests, current news recommendation methods often have the problem of data sparseness and information cocoons. This paper  proposes a news recommendation algorithm based on implicit feedback signals and multi-dimensional interests. By introducing implicit feedback signals such as user exposure unclicked, the data sparsity problem in modeling the recommendation model is alleviated, and a comparative attention mechanism is proposed to model the fusion of user clicked and unclicked news. Besides, this paper also proposes user dynamic interest modeling based on candidate news perception and contrast learning modeling of dynamic and static interests. Multi-dimensional user interests achieve rich and dynamic user preference accurate localization. This study conducts extensive experiments on a real dataset. Three evaluation metrics and a variety of performance tests are used to compare with other baseline methods to verify that the proposed model  outperforms other methods.
    Reference | Related Articles | Metrics
    Research on Safety Helmet Wearing Detection Algorithm in Chemical Industry Park Scenarios
    LI Yonghui, YUAN Liang, HE Li, RAN Teng, LYU Kai
    Computer Engineering and Applications    2024, 60 (12): 111-117.   DOI: 10.3778/j.issn.1002-8331.2309-0440
    Abstract57)      PDF(pc) (3799KB)(64)       Save
    The deep learning-based methods for safety helmet wearing status detection are not robust enough, resulting in poor detection in chemical industrial parks. In this study, a safety helmet wearing status detection algorithm SEE-YOLOv5s based on YOLOv5s is proposed to improve the accuracy. Firstly, by adding a small target detection layer to better capture and locate small targets, the ability of model to recognize and detect small targets in complex scenes is improved. Secondly, all C3 modules of YOLOv5s are integrated with the lightweight ECA (efficient channel attention) attention mechanism, effectively integrating global feature information, improving small object detection ability, and reducing model complexity. Finally, the EIoU (effective intersection over union) loss function is introduced to improve the training effect of the model. Experiments are conducted on the self built SHWD-HG dataset, and the experimental results show that the improved YOLOv5s increased P(precision), R(recall), mAP 0.5(mean average precision 0.5), and mAP0.5:0.95 compared to the original model by 0.5, 6.5, 5.9, and 3.2 percentage points, respectively, and the model size is reduced by 0.7 MB.
    Reference | Related Articles | Metrics
    MC-NAS:Visual Module Contribution Neural Architecture Search Method
    ZHANG Rui, LI Ji, CHAI Yanfeng
    Computer Engineering and Applications    2024, 60 (12): 118-128.   DOI: 10.3778/j.issn.1002-8331.2303-0046
    Abstract39)      PDF(pc) (4078KB)(57)       Save
    The existing NAS methods can not directly show the relationship between network models and candidate modules and the accuracy of model recognition. At the same time, many NAS methods have poor scalability and cannot extend their search strategies to arbitrary search space. In response to the above challenges, this paper proposes a visual module contribution neural architecture search method. In this paper, the concept of module contribution is first proposed, and the unified sampling principle in arbitrary search space is given by analyzing the dilemma of the contribution calculation process. Finally, the neural network architecture is generated through a dynamic network programming algorithm for specific constraints. Extensive experimental results demonstrate the effectiveness of the proposed algorithm. Using the CIFAR?10, CIFAR?100, and ImageNet16?120 datasets, the average accuracy on the NAS-Bench-201 benchmark is 93.33%, 71.07%, and 42.69%, respectively.
    Reference | Related Articles | Metrics
    Multi-Knowledge Base Common Sense Question Answering Model Based on Local Feature Fusion
    TIAN Yuqing, WANG Chunmei, YUAN Feiniu
    Computer Engineering and Applications    2024, 60 (12): 129-135.   DOI: 10.3778/j.issn.1002-8331.2303-0080
    Abstract43)      PDF(pc) (2709KB)(36)       Save
    The input and feature combination of the current commonsense reasoning model based on multi-knowledge base fusion is too simple, resulting in the loss of some important information related to questions and answers, which limits the effect of the commonsense reasoning model integrating external knowledge. In addition, during the commonsense question and answer task, the problem of vector anisotropy in the output of the pre-training language model and the answer representation has not been solved. These problems are the factors that lead to the poor reasoning performance of commonsense question answering. To solve the above problems, this paper proposes a multi-knowledge base commonsense question answering model based on local feature fusion, which improves the fusion of external knowledge bases and question-answer texts. The model integrates the local question and answer features into the global features of the pre-trained language model to enrich the feature information of the model, and combines the features of multiple dimensions in the prediction layer for prediction. The model for the questions and answers to be matched. Sentence representations are whitened and then the matching task is performed. Through the whitening operation, the model enhances the isotropy of the sentence representation and improves the representation ability of the sentence vector. This paper also explores the effect of different pre-trained encoders (such as, ALBERT, ELECTRA) on the model to strengthen knowledge. The feature extraction ability of text is strengtened, and the stability of the model is proved. The experimental results show that under the same BERT-base encoder experiment, the accuracy of the model reaches 78.6%, which is 3.5?percentage points higher than the baseline model. In the experiment of ELECTRA-base encoder, the accuracy reaches 80.1%.
    Reference | Related Articles | Metrics
    Multi-Relational Graph Self-Attention Mechanism Enhanced Knowledge Representation Learning
    LIU Dongshuai, AN Jingmin, MENG Fanchen, LI Guanyu
    Computer Engineering and Applications    2024, 60 (12): 136-143.   DOI: 10.3778/j.issn.1002-8331.2303-0164
    Abstract35)      PDF(pc) (3217KB)(68)       Save
    Heterogeneous graph represents multi-relational data. Current knowledge representation learning methods improve the ability of knowledge triples representation, these methods increase the interaction between entities and relation embeddings, but they cannot make triples contain multi-level semantics, that is, multiple association attributes of entities under a specific relation. Graph neural networks use structural informations to assign weights to the  neighbor nodes of entity, but there is no way to more precise messaging of complex interactions between entities and neighbors. To solve this problem, a knowledge representation learning model based on graph self-attention mechanism (CompESAT) is proposed to encode triples. The self-attention mechanism is introduced into composite entities generated by aggregated neighbors, and the entity representation is dynamically updated with the changes of different neighbor contributions. The model encoder adaptively learns composite entity embedding by defining multiple graph attention mechanism layers. These layers can deal with multiple local features of composite entity representation. The decoder complements the global features of the decoding triplet. In the link prediction task, all evaluation indexes of the model are improved on the dataset FB15k-237, and MRR and Hit@10 are respectively improved by 0.042 and 0.045, on the dataset WN18RR, Hit@10 is improved by 0.069.
    Reference | Related Articles | Metrics
    End-to-End Aspect-Based Sentiment Analysis Model for BERT and LSI
    DAI Jiamei, KONG Weiwei, WANG Ze, LI Peizhe
    Computer Engineering and Applications    2024, 60 (12): 144-152.   DOI: 10.3778/j.issn.1002-8331.2303-0220
    Abstract44)      PDF(pc) (3526KB)(54)       Save
    A model LSI-BERT based on BERT and fused lexical and syntactic information (LSI) is proposed to address the shortcomings of the existing end-to-end aspect-based sentiment analysis (E2E-ABSA) method research that does not fully utilize textual information. A BERT embedding layer and a TFM feature extractor are used to extract semantic information, and lexical information is extracted by the industrial-grade natural language processing tool SpaCy. Two weighting factors α and β are introduced to fuse semantic and lexical information. Graph attention networks (GAT) is used to extract syntactic dependency information based on the adjacency matrix generated from the syntactic dependency tree. A dual-stream attention network is used to fuse syntactic dependency information and textual information fused with lexical information to achieve better interaction between these two types of information. The experimental results show that the model outperforms the current representative model on three commonly used benchmark datasets.
    Reference | Related Articles | Metrics
    Method for Recognition of Food Images Based on Improved Attention Model
    JIANG Feng, ZHOU Lili
    Computer Engineering and Applications    2024, 60 (12): 153-159.   DOI: 10.3778/j.issn.1002-8331.2303-0249
    Abstract62)      PDF(pc) (4133KB)(105)       Save
    With the increasing demands for healthy diet of people, various kinds of food evaluation assistant softwares emerge as times require, and the topic of food images recognition receives more and more attention. Food images recognition belongs to fine-grained recognition problem, which is more difficult than other image recognition. Moreover, popular food image datasets, such as ISIA Food-500, ETH Food-101 and Vireo Food-172, contain a small number of images, which makes it difficult to train the image recognition system well and further increasing the recognition difficulty. In this paper, an image recognition method based on attention mechanism is proposed. The method introduces the concept of local-attention on the basis of self-attention to describe the fine-grained features of the image and improve the accuracy of image recognition. In addition, an image self-supervised pre-training algorithm is proposed as well, to alleviate the problem of insufficient training samples of food images. The experimental results show that Top-1 accuracy and Top-5 accuracy of the proposed method on ISIA Food-500 dataset are 65.58% and 90.03%, respectively, which is superior to the state-of-the-art algorithms.
    Reference | Related Articles | Metrics
    DCFNet:Dual-Channel Feature Fusion of Real Scene for Point Cloud Semantic Segmentation
    SUN Liujie, ZHU Yaoda, WANG Wenju
    Computer Engineering and Applications    2024, 60 (12): 160-169.   DOI: 10.3778/j.issn.1002-8331.2305-0290
    Abstract45)      PDF(pc) (5427KB)(78)       Save
    The point cloud of the real scene not only has the spatial geometric information of the point cloud, but also has the color information of the 3D object. The existing network cannot effectively use the local features and spatial geometric feature information of the real scene. Therefore, a dual-channel feature fusion of real scene for point cloud semantic segmentation DCFNet can be used for indoor and outdoor scene semantic segmentation in different scenarios. More specifically, in order to solve the problem that the color information of the point cloud in the real scene cannot be fully extracted, the method uses two input channels, and the channel adopts the same feature extraction network structure. The input of the upper channel is the complete RGB color and point cloud coordinate information, and the channel mainly focuses on the scene features of complex objects, while the lower channel only inputs the point cloud coordinate information. This channel mainly focuses on the spatial geometric characteristics of the point cloud. In each channel, in order to better extract local and global information and improve network performance, the inter-layer fusion module and the Transformer channel feature expansion module are introduced. At the same time, the existing 3D point cloud semantic segmentation methods lack of attention to the relationship between local features and global features, which leads to poor segmentation results for complex scenes. In this paper, the features extracted from the upper and lower channels are fused by the DCFFS (dual-channel feature fusion segmentation) module, and the semantic segmentation of the real scene is performed. The experimental results show that the mean intersection over union (MIOU) of the proposed DCFNet segmentation method on the S3DIS Area5 indoor scene dataset and the STPLS3D outdoor scene dataset reaches 71.18% and 48.87% respectively. The mean average precision (MACC) and overall accuracy (OACC) reach 77.01% and 86.91% respectively, achieving high-precision point cloud semantic segmentation in real scenes.
    Reference | Related Articles | Metrics
    Cross-Modal Retrieval with Improved Graph Convolution
    ZHANG Hongtu, HUA Chunjian, JIANG Yi, YU Jianfeng, CHEN Ying
    Computer Engineering and Applications    2024, 60 (11): 95-104.   DOI: 10.3778/j.issn.1002-8331.2302-0064
    Abstract55)      PDF(pc) (9110KB)(84)       Save
    Aiming at the problem that existing image text cross-modal retrieval is difficult to fully exploit the local consistency in the mode in the common subspace, a cross-modal retrieval method based on improved graph convolution is proposed. In order to improve the local consistency within each mode, the modal diagram is constructed with a single sample as a node, fully mining the interactive information between features. In order to solve the problem that graph convolution network can only do shallow learning, the method of adding initial residual link and weight identity map in each layer of graph convolution is adopted to alleviate this phenomenon. In order to jointly update the central node features through higher-order and lower-order neighbor information, an improvement is proposed to reduce neighbor nodes and increase the number of layers in graph convolution network. In order to learn highly locally consistent and semantically consistent public representation, it shares the weights of common representation learning layer, and jointly optimizes the semantic constraints within the modes and the modal invariant constraints between modes in the common subspace. The experimental results show that on the two cross-modal data sets of Wikipedia and Pascal sentence, the average mAP values of different retrieval tasks are 2.2%~42.1% and 3.0%~54.0% higher than the 11 existing methods.
    Reference | Related Articles | Metrics
    High-Precision Fall Detection Algorithm with Improved YOLOv5
    ZHU Shenghao, QIAN Chengshan, KAN Xi
    Computer Engineering and Applications    2024, 60 (11): 105-114.   DOI: 10.3778/j.issn.1002-8331.2307-0190
    Abstract94)      PDF(pc) (4403KB)(154)       Save
    In order to counter the limitations of the original YOLOv5 human fall detection task, a highly accurate fall detection algorithm, called C2D-YOLO, is proposed in this paper. The original task struggles to effectively handle complex detail capture, deformation handling, target adaptation to different scales, and occlusion detection. To overcome these challenges, several improvements are made to the YOLOv5 model. Firstly, a new feature extraction module called C2D is introduced, which improves feature characterisation, captures complex details, and handles deformations by combining deformable convolution, standard convolution, and channel-space hybrid attention mechanisms. Secondly, in the neck network, Swin Transformer block replaces the bottleneck layer of the C3 module to retain more feature information, thereby improving target detection accuracy at different scales and enhancing performance under occlusion. Finally, the head module of YOLOv5 is enhanced based on the decoupled structure of YOLOX borrowed from YOLOv5 to optimise classification and regression performance. Experimental results show that this method achieves a 3.2 percentage points improvement in mAP0.5 and a 6.5 percentage points improvement in mAP0.5:0.95 compared to existing YOLOv5s. These improvements significantly increase detection accuracy and reduce false alarm rates.
    Reference | Related Articles | Metrics
    Motion Imagery Signal Analysis Incorporating Spatio-Temporal Adaptive Graph Convolution
    LIU Jing, KANG Xiaohui, DONG Zehao, LI Xuan, ZHAO Wei, WANG Yu
    Computer Engineering and Applications    2024, 60 (11): 115-128.   DOI: 10.3778/j.issn.1002-8331.2301-0173
    Abstract29)      PDF(pc) (7200KB)(46)       Save
    Brain-computer interface (BCI) technology based on motor imagery (MI) EEG signals has been widely concerned and studied in the medical application of motor function rehabilitation for stroke patients. However, the MI signal has the characteristics of low signal-to-noise ratio and large volume variability, which leads to excessive noise in the EEG signal and affects the classification performance. Therefore, how to fully extract MI signal features to obtain higher single-subject classification accuracy, and how to train a general model with excellent cross-subject performance are urgent problems to be solved when MI-BCI system is used in practical applications. In response to this problem, this paper proposes a spatiotemporal adaptive graph convolutional network model for different subjects, which extracts MI feature signals from two dimensions of time and spatio for classification. The model includes four modules:spatial adaptive graph convolution module, temporal adaptive graph convolution module, feature fusion module and feature classification module. The spatial adaptive graph convolution module dynamically constructs the spatial graph representation through feature similarity between channels, and gets rid of the limitation of artificially constructs graph representation. The time-adaptive graph convolution module divides the time series of EEG signals into multiple time segments and calculates the similarity between time segments, so as to adaptively construct the time map representation of EEG signals and eliminate the influence of noise. Finally, feature fusion and classification are performed. The results show that the proposes method achieves an average classification accuracy of 90.45% and 91.64% is achieved by using 10-fold cross-validation method on BCIIV2a dataset and 91.64% on HGD dataset. Compared with the current state-of-the-art methods, this method achieves a higher accuracy rate, proving the effectiveness of our model. By using transfer learning to experiment on different individuals, the average accuracy is increased by 1.66 percentage points, which proves the robustness of the model.
    Reference | Related Articles | Metrics
    Named Entity Recognition for Mine Electromechanical Equipment Monitoring Text
    QIU Yunfei, XING Haoran, YU Zhilong, ZHANG Wenwen
    Computer Engineering and Applications    2024, 60 (11): 129-138.   DOI: 10.3778/j.issn.1002-8331.2302-0246
    Abstract37)      PDF(pc) (3884KB)(67)       Save
    The correct extraction of equipment name, parameter standard, fault location, fault type and other entities in the monitoring text of mine electromechanical equipment can assist experts to find abnormal equipment as soon as possible and improve the efficiency and accuracy of equipment fault analysis. In view of the fact that most entities in the field of mine electromechanical equipment are nested entities with long characters and strong contextual relevance, an entity recognition method combining multi-granularity features is proposed in this paper. The long sequence nested entity boundary is initially determined by the machine reading comprehension framework, and the context association representation between entities is deeply explored by BiLSTM neural network integrating attention mechanism. The experimental results show that this method has a good recognition effect on the entities in the mine electromechanical equipment monitoring text, and improves the effectiveness of other named entity recognition tasks in low resource scenarios.
    Reference | Related Articles | Metrics
    Multi-Object Tracking Algorithm Based on Improved FairMOT
    LI Wang, ZHANG Nana
    Computer Engineering and Applications    2024, 60 (11): 139-146.   DOI: 10.3778/j.issn.1002-8331.2302-0314
    Abstract34)      PDF(pc) (4518KB)(58)       Save
    In response to problems such as missed detections and unfriendly data association algorithms leading to frequent switching among objects in complex environments, a multi-object tracking algorithm MFMOT that utilizes the FairMOT framework as its foundation is proposed. Firstly, a lightweight multi-branch attention module is designed, which utilizes channel grouping to reduce complexity and enhances features from three dimensions, enabling the network to select and extract feature information. Secondly, the re-identification branch uses the PolyLoss loss function to enhance the semantic information between similar objects to distinguish different objects of the same type. Finally, a multi-feature fusion similarity matrix is proposed to obtain the optimal similarity matrix by fusing multiple feature similarity matrices, reducing the number of identity switches between targets. The experimental results show that the HOTA scores are 61.5% and 56.1% in the MOT17 and MOT20 datasets respectively, which improves by 2.2 percentage points and 2.3 percentage points compared to the original FairMOT model. Furthermore, when applying the multi-feature fusion similarity matrix to a multi-object tracking method with the same mode as FairMOT, improvements in HOTA, MOTA, and IDF1 are observed.
    Reference | Related Articles | Metrics
    Medical Named Entity Recognition Incorporating Word Information and Graph Attention
    ZHAO Zhenzhen, DONG Yanru, LIU Jing, ZHANG Junzhong, CAO Hui
    Computer Engineering and Applications    2024, 60 (11): 147-155.   DOI: 10.3778/j.issn.1002-8331.2302-0321
    Abstract44)      PDF(pc) (3541KB)(70)       Save
    The Chinese clinical natural language is rich in a large amount of medical record information. Naming entity recognition for electronic medical records can help establish medical auxiliary diagnostic systems, which is of great significance for the development of the medical field. At the same time, it is conducive to downstream tasks such as relationship extraction and the implementation of knowledge graphs. However, Chinese electronic medical records have problems with difficulty in Chinese word segmentation, numerous medical terminology, and special expressions, which can easily lead to incorrect expression of text features. Therefore, this paper proposes a medical named entity recognition research model based on enhanced word information and graph attention, which improves the performance of the network model by enhancing local and global features. Due to the fact that embedding a single word vector for Chinese entity recognition can easily ignore word information and semantics in the text, this paper embeds a highly correlated word vector in the word vector, which not only enhances text representation but also avoids word segmentation errors. Additionally, a MedBert model for learning medical knowledge is embedded in the word embedding layer, which can dynamically generate feature vectors according to different contexts, helps solve the problem of polysemy and specialized vocabulary in electronic medical records. At the same time, adding a graph attention module in the coding layer enhances the network’s ability to learn text context relationships and enhances the model’s learning of medical special grammar. Finally, F1 values of 86.38% and 84.76% are obtained on the cEHRNER and cMedQANER datasets, respectively, showing better robustness compared to other models.
    Reference | Related Articles | Metrics
    Effective Mask and Local Enhancement for Occluded Person Re-Identification
    WANG Xiaomeng, LIANG Fengmei
    Computer Engineering and Applications    2024, 60 (11): 156-164.   DOI: 10.3778/j.issn.1002-8331.2304-0339
    Abstract31)      PDF(pc) (3585KB)(37)       Save
    Human body is often occluded by a variety of obstacles in the monitoring system, so occluded person re-identi?cation is still a long-standing challenge. Recent methods based on Transformer and external semantic clues have improved feature representation and related performance, but there are still problems with weak representation and unreliable semantic clues. To solve the above problems, a novel method based on Transformer is proposed. Firstly, a more efficient way to generate masks is introduced. Reliable masks allow models to be independent of external semantic clues and to achieve automatic alignment. Secondly, a sequence reconstruction module based on average attention score is proposed, which can focus on foreground information more effectively. Thirdly, it proposes a local enhancement module to obtain more robust feature representation. Finally, performance of the propose method and various existing methods are compared on the Occluded-Duke, Occluded-ReID, Partial-ReID and Market-1501 datasets. The accuracy of Rank-1 of reaches 72.3%, 84.8%, 86.5% and 95.6%, respectively, the mAP accuracy are 62.9%, 83.2%, 76.4% and 89.9%. Experimental results demonstrate that the performance of the propose model is improved compared with other advanced networks.
    Reference | Related Articles | Metrics
    Hierarchical Label Text Classification Method with Deep Label Assisted Classification Task
    CAO Yukun, WEI Ziyue, TANG Yijia, JIN Chengkun, LI Yunfeng
    Computer Engineering and Applications    2024, 60 (10): 105-112.   DOI: 10.3778/j.issn.1002-8331.2302-0237
    Abstract56)      PDF(pc) (3457KB)(113)       Save
    Hierarchical label text classification is a challenging task in natural language processing, where each document needs to be correctly classified into multiple labels corresponding to a hierarchical structure. However, in the label set, the insufficient semantic information contained in the labels, along with the low number of documents classified into deep labels, inadequate training of deep-level labels leads to significant imbalance problems in label training. A two-channel hierarchical label text classification method with deep label assisted classification task (DLAC) is proposed to deal with the above challenges. The method proposes a deep-level label assisted classifier that effectively uses text features with deep-level labels corresponding to parent label nodes (i.e., rich features of shallow labels) to improve the classification performance of deep-level labels based on semantic enhancement of labels. Experimental results with eleven algorithms on three datasets demonstrate that the proposed model effectively improves the classification performance of deep-level labels and achieves better results.
    Reference | Related Articles | Metrics
    Non-Stationary Causal Discovery Method Based on Conditional Independence Test
    HAO Zhifeng, ZHANG Weijie, CAI Ruichu, CHEN Wei
    Computer Engineering and Applications    2024, 60 (10): 113-120.   DOI: 10.3778/j.issn.1002-8331.2301-0083
    Abstract37)      PDF(pc) (7492KB)(62)       Save
    Causal discovery on non-stationary time-series data is of great importance but challenging. Existing works mainly assume that the observed data change with time or domain, which requires the introduction of time or domain as prior knowledge. The aforementioned methods are usually unavailable on the segmented-stationary non-stationary scenarios. Therefore, this paper proposes a non-stationary causal discovery method that combines changepoint detection and structural vector auto-regressive model. It uses the changepoint detection method to identify the time point of change, then divides the time of the previous step into stationary intervals, and further uses the stationary algorithm to infer their local causal structures. Experiments on simulated and real-world data prove the effectiveness of the proposed method.
    Reference | Related Articles | Metrics
    Biomedical Event Trigger Detection Based on Two-Stage Question Answering Paradigm
    XING Shuai, XIONG Yujie, SU Qianmin, HUANG Jihan
    Computer Engineering and Applications    2024, 60 (10): 121-131.   DOI: 10.3778/j.issn.1002-8331.2301-0152
    Abstract24)      PDF(pc) (4982KB)(29)       Save
    The existing biomedical event trigger detection methods have the following defects: Redundant information unrelated to triggers are retained; potential correlations between entities and events are ignored; traditional methods are vulnerable to data scarcity. A biomedical event trigger detection based on two-stage question answering paradigm is proposed to address the above problems. In the event type identification phase, in order to exclude the interference of irrelevant information, the attention based on syntactic distance is allowed to capture more meaningful contextual features. In order to effectively utilize the potential features in the entities, the word-entity-event co-occurrence feature based on global statistics is used to guide event type aware attention to explore the strong relationship between words and events. In the trigger localization phase, the trigger index of the event in the sentence is answered according to the identified event type questions, thus leveraging the rich question answering database to achieve data enhancement. The results on the MLEE corpus show that the two-stage question answering paradigm, syntactic distance attention, and event type aware attention effectively improve the performance of the model, and the proposed model achieves 81.39% F1-score, outperforming other baseline models in terms of detailed results for multiple event types.
    Reference | Related Articles | Metrics
    3D Box Instance Segmentation Method Using Center Prediction-Clustering
    YANG Yutong, HE Hongjie
    Computer Engineering and Applications    2024, 60 (10): 132-139.   DOI: 10.3778/j.issn.1002-8331.2301-0129
    Abstract20)      PDF(pc) (6542KB)(40)       Save
    With the extensive deployment of deep learning technology in industry, automated systems applied in transportation, loading and unloading, packaging, sorting and other links have become a research hotspot in warehousing and logistics industry. Aiming at robot box unstacking scene, a point cloud center prediction-clustering network (CPCN) is proposed based on the deep learning method, which can segment the box stack and calculate the center coordinates of the upper surface of each box. Based on the traditional semantic-instance joint segmentation structure, CPCN designs a central prediction module and a central reinforcement module for the instance segmentation branch. The central prediction module avoids the error of central point segmentation by directly locating the instance center, and the central reinforcement module makes the points belonging to the same instance converge to the center in the feature space, both of which effectively enhance the identification ability of the instance features. In addition, the central-instance clustering method designed in the part of instance feature processing calculates the instance label by directly measuring the distance of the instance feature, which greatly reduces the computing time. Experiments on the box data set show that compared with the existing methods, the average accuracy of CPCN is improved by 0.7 percentage points at the lowest and 17.2 percentage points at the highest, the accuracy of instance center reaches 94.4%, the center offset is as low as 13.70?mm, and the reasoning speed is faster than that of the same type of joint division network. CPCN is more targeted for the box instance segmentation and has good application value.
    Reference | Related Articles | Metrics
    Self-Supervised Tabular Data Anomaly Detection Method Based on Knowledge Enhancement
    GAO Xiaoyu, ZHAO Xiaoyong, WANG Lei
    Computer Engineering and Applications    2024, 60 (10): 140-147.   DOI: 10.3778/j.issn.1002-8331.2301-0087
    Abstract49)      PDF(pc) (3197KB)(69)       Save
    The traditional supervised anomaly detection methods have developed rapidly. In order to reduce the dependence on labels, self-supervised pre-training methods are widely studied, and the studies show that additional intrinsic semantic knowledge embedding is crucial for table learning. In order to mine the rich knowledge information in tabular data, the self-supervised tabular data anomaly detection method based on knowledge enhancement (STKE) is proposed with the following improvements. The proposed data processing module integrates domain knowledge (semantics) and statistical mathematics knowledge into feature construction. At the same time, self-supervised pre-training (parameter learning) provides contextual knowledge priors to achieve the rich information transfer of tabular data. The mask mechanism is used on the original data to learn the masked features by learning the relevant non-masked features, and predict the original value of the additive Gaussian noise in the hidden layer space of the data. This strategy promotes the model even in the presence of noisy inputs. The original feature information can also be recovered. A hybrid attention mechanism is used to effectively extract association information between data features. The experimental results of the proposed method on six datasets show superior performance.
    Reference | Related Articles | Metrics
    Attributed Bipartite Graph Neural Networks with Motifs Information for Network Representation Learning
    LYU Shaoqing, WANG Chichi, LI Tingting, BAO Zhiqiang
    Computer Engineering and Applications    2024, 60 (10): 148-155.   DOI: 10.3778/j.issn.1002-8331.2301-0141
    Abstract35)      PDF(pc) (3097KB)(32)       Save
    At present, network representation learning methods are mostly aimed at homogeneous networks, ignoring the particularity of attributed bipartite networks and the motifs structure of networks. In order to solve the above problems, this paper proposes an attributed bipartite graph neural network with motifs information for network representation learning (MABG). MABG adjusts the edge weights by the number of butterfly motifs formed by two nodes in the network, to construct the motifs weight matrix and obtain the attributed bipartite network adjacency matrix with motifs information. Then two different strategies are adopted to capture the explicit and implicit messages in the bipartite network. For explicit relationships, a message-passing mechanism is operated between different types of nodes. For implicit relationships, a message alignment mechanism is used in nodes of the same type. An adversarial model is implemented to minimize the difference between input attributes and explicit relationship representations. Finally, a cascaded framework is proposed to capture high-order network information and obtain the final node representation. Extensive experiments are conducted in recommended tasks on four real-world datasets. The results demonstrate the effectiveness of MABG compared with other state-of-art methods.
    Reference | Related Articles | Metrics
    User Consistent Social Recommendation for Multi-View Fusion
    ZHAO Wentao, LIU Tiantian, XUE Saili, WANG Dewang
    Computer Engineering and Applications    2024, 60 (10): 156-163.   DOI: 10.3778/j.issn.1002-8331.2301-0099
    Abstract34)      PDF(pc) (3192KB)(54)       Save
    Aiming at the problem of low accuracy of traditional social recommendation, this paper proposes a use consistent social recommendation model based on multi-view fusion. The social recommendation model takes into account the inconsistency of users in social networks and the influence of single view information on the recommendation results. It uses the attention mechanism to dynamically filter out inconsistent social neighbors, and combines user-item interaction information to learn user feature expression. At the same time, the feature representation of the project in low-dimensional space is learned from multiple views such as knowledge graph and user-project history interaction information. Finally, the characteristics of users and items are represented by inner product operation to complete the final recommendation task. In order to verify the effectiveness of the proposed recommendation algorithm, six baseline models are compared on two public datasets of Douban and Yelp, and the recall, normalized discounted cumulative gain (NDCG ) and precision are used as evaluation indicators. The experimental results show that the performance of the proposed social recommendation model is better than other models.
    Reference | Related Articles | Metrics
    Multi-Model Fusion VoxSRC22 Speaker Diarization System
    DU Yuxuan, ZHOU Ruohua
    Computer Engineering and Applications    2024, 60 (10): 164-172.   DOI: 10.3778/j.issn.1002-8331.2301-0080
    Abstract27)      PDF(pc) (3364KB)(27)       Save
    In order to effectively address the problem of speaker diarization, a novel speaker diarization method is proposed. The proposed method consists of six modules, including voice activity detection (VAD), speech enhancement, speaker embedding extractor, speaker clustering, overlapping speech detection (OSD), and result fusion. The application of speech enhancement techniques can improve the performance of voice activity detection. The effective combination of different speaker embedding extractors and clustering algorithms can further reduce speaker diarization error rate. The best performance is achieved by processing the overlapping speech after system fusion. Experimental results show that the performance of the proposed system outperforms the baseline by 72%, achieves a speaker diarization error rate (DER) of 5.48% and a Jaccard error rate (JER) of 32.10% on the VoxCeleb speaker recognition challenge (VoxSRC) 2022 evaluation set, ranking fourth.
    Reference | Related Articles | Metrics