Computer Engineering and Applications ›› 2021, Vol. 57 ›› Issue (15): 193-199.DOI: 10.3778/j.issn.1002-8331.2004-0352
Previous Articles Next Articles
WANG Haobin, HU Ping
Online:
Published:
王浩镔,胡平
Abstract:
For the existing multi-label classification algorithm has ignored the endogenous relationship between the labels, In this paper, the multi-label classification problem is converted into a sequence generation problem, and the symbiotic relationship between the labels is fully considered. Based on the Seq2Seq model, text features are extracted from two dimensions:word level and semantic level. By improving the feature extraction module, encoder structure, mixed attention mechanism, and decoder prediction part, a multi-label classification algorithm based on multi-level features and mixed attention mechanism is proposed. The effectiveness of the algorithm is verified on the three data sets of Zhihu, RCV1-V2 and AAPD and compared with existing algorithms. The proposed algorithm is superior to other algorithms in F1 value, recall rate and Hamming loss.
Key words: multi-label classification, multi-level features, mixed attention
摘要:
针对现有多标签分类算法忽略了标签之间的内生关系,将多标签分类问题转化为序列生成问题,充分考虑标签之间的共生关系,以Seq2Seq模型为基础,从词语级别和语义级别两个维度提取文本特征,通过对特征提取模块、编码器结构、混合注意力机制、解码器预测部分的改进,提出了基于多级特征和混合注意力机制的多标签分类算法。在Zhihu、RCV1-V2和AAPD三个数据集上进行算法有效性验证并与现有算法对比,提出的算法在F1值、召回率和汉明损失三个指标上均优于其他算法。
关键词: 多标签分类, 多级特征, 混合注意力
WANG Haobin, HU Ping. Multi-label Long Text Classification Algorithm Based on Multi-level Features[J]. Computer Engineering and Applications, 2021, 57(15): 193-199.
王浩镔,胡平. 采用多级特征的多标签长文本分类算法[J]. 计算机工程与应用, 2021, 57(15): 193-199.
0 / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: http://cea.ceaj.org/EN/10.3778/j.issn.1002-8331.2004-0352
http://cea.ceaj.org/EN/Y2021/V57/I15/193