计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (21): 351-360.DOI: 10.3778/j.issn.1002-8331.2407-0408

• 工程与应用 • 上一篇    

融合动态标签平滑策略和关系网络的中医证候分类模型

廖明,杜建强,罗计根,黄强,贺佳,范越   

  1. 1.江西中医药大学 计算机学院,南昌 330004
    2.南昌大学,南昌 330031
    3.江西省中医人工智能重点研究室,南昌 330004
  • 出版日期:2025-11-01 发布日期:2025-10-31

Fusion of Online Label Smoothing Strategy and Relational Network for TCM Syndrome Classification Model

LIAO Ming, DU Jianqiang, LUO Jigen, HUANG Qiang, HE Jia, FAN Yue   

  1. 1.School of Computer Science, Jiangxi University of Chinese Medicine, Nanchang 330004, China
    2.Nanchang University, Nanchang 330031, China
    3.Key Laboratory of Artificial Intelligence in Chinese Medicine, Jiangxi, Nanchang 330004, China
  • Online:2025-11-01 Published:2025-10-31

摘要: 中医证候智能分类研究中,由于类别不平衡和高质量人工标注样本较少,模型在学习少样本标签时能力不佳,导致整体分类效果不理想。为解决以上问题,提出了一种融合动态标签平滑策略和关系网络的中医证候分类模型(online label smoothing for relational networks based on pre-trained language models,PLM-SNet)。该模型使用预训练语言模型对输入的病例文本进行编码,获取输入样本的特征表示;通过关系网络对样本的支持集和查询集进行特征信息的级联,获得查询集样本的相关性得分;采用动态标签监督模型的训练损失,实时更新和优化类别的软标签,从而获得最终的类别得分。在中医公开数据集TCM-SD和自建中医哮喘数据集J-SD上的实验结果表明,与最优基线模型相比,PLM-SNet将中医证候分类的Macro-F1和G-Mean分别提高了3.47、2.48、3.06和2.58个百分点,实验结果验证了该模型在类别不平衡中医证候分类任务中的科学性和有效性。

关键词: 动态标签, 关系网络, 类别不平衡, 中医证候分类

Abstract: In the research on intelligent classification of TCM syndromes, due to class imbalance and the scarcity of high-quality manually labeled samples, the ability of model to learn from few-sample labels is inadequate, resulting in unsatisfactory overall classification performance. To address these issues, an online label smoothing strategy integrated with relational networks for TCM syndrome classification models (online label smoothing for relational networks based on pre-trained language models, PLM-SNet) is proposed. This model encodes the input case text using pre-trained language models to obtain feature representations of the input samples. It cascades the feature information through a relational network using the support set and query set of the samples to obtain the relevance scores of the samples in the query set. The training loss of the online label supervision model is used to update and optimize the soft labels of the categories in real-time, resulting in the final category scores. Experimental results on the TCM public dataset TCM-SD and the self-constructed TCM asthma dataset J-SD show that compared to the optimal baseline model, PLM-SNet improves the Macro-F1 and G-Mean of TCM syndrome classification by 3.47, 2.48, 3.06, and 2.58 percentage points, respectively. The experimental results verify the scientific validity and effectiveness of the model in the class imbalance TCM syndrome classification task.

Key words: online label, relational networks, category imbalance, Chinese medicine syndrome classification