计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (9): 343-352.DOI: 10.3778/j.issn.1002-8331.2311-0048

• 工程与应用 • 上一篇    下一篇

结合对抗训练和注意力机制的蔬菜种植领域命名实体识别

胡乔,赵春江,吴华瑞,缪祎晟,郭旺   

  1. 1.江苏大学 农业工程学院,江苏 镇江 212013
    2.国家农业信息化工程技术研究中心, 北京 100097
    3.北京市农林科学院 信息技术研究中心, 北京 100097
    4.农业农村部数字乡村技术重点实验室, 北京 100097
  • 出版日期:2025-05-01 发布日期:2025-04-30

Named Entity Recognition in Vegetable Cultivation Combining Adversarial Training and Attention Mechanism

HU Qiao, ZHAO Chunjiang, WU Huarui, MIAO Yisheng, GUO Wang   

  1. 1.College of Agricultural Engineering, Jiangsu University, Zhenjiang, Jiangsu’212013, China
    2.National Engineering Research Center for Information Technology in Agriculture, Beijing 100097, China
    3.Information Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China
    4.Key Laboratory of Digital Village Technology, Ministry of Agriculture and Rural Affairs, Beijing 100097, China
  • Online:2025-05-01 Published:2025-04-30

摘要: 针对复杂语境下的蔬菜种植领域命名实体识别任务中存在实体分布不均衡、实体边界不清晰和语义关联不足等问题,提出一种基于对抗训练和多头自注意力机制的蔬菜种植领域命名实体识别模型。以番茄为研究对象,采用ALBERT(a lite BERT)提取语料动态词向量,结合对抗训练对词向量扰动生成对抗样本并集成为嵌入层输出,缓解农业数据不平衡问题;在特征提取层中通过引入多头自注意力机制对BiLSTM提取的序列特征进一步优化权重分布,更多关注边界信息,加强文本语义关联;最后采用条件随机场解码标注序列。在由8个类别和5 542条标注样本构建的语料库Veg-Tomato上进行了实验。结果表明,该模型的精确率、召回率和F1值分别达89.26%、85.77%、87.48%,较最优基线模型提高了0.69、3.56、2.21个百分点,在小样本数据上仍能表现较高的识别精度,适用于蔬菜种植领域命名实体识别任务。

关键词: 蔬菜种植, 命名实体识别, ALBERT, 对抗训练, 多头自注意力

Abstract: A vegetable planting domain named entity recognition model based on adversarial training and multi-head self-attention mechanism is proposed to solve the issues of entity imbalance, unclear entity boundaries and insufficient semantic association in the task of named entity recognition in the vegetable planting field under complex contexts. It utilizes ALBERT (A Lite BERT) to extract dynamic word vectors for tomato as the research object. The word vectors are perturbed using adversarial training to generate adversarial samples, which are then integrated into the embedding layer output. The sequence features extracted by the BiLSTM are further optimized using the multi-head self-attention mechanism in the feature extraction layer. This optimization enhances the weight distribution, focuses more on boundary information and strengthens the semantic association of the text, thereby alleviating the imbalance of agricultural data and improving the robustness of the model. The labeled sequences are decoded using a conditional random field. Experimental evaluations are conducted on the Veg-Tomato corpus, consisting of 8 categories and 5 542 annotated samples. The results demonstrate that VC-AMS achieves precision, recall, and F1 values of 89.26%, 85.77%, and 87.48% respectively, surpassing the optimal baseline model by 0.69, 3.56, and 2.21 percentage points. The model exhibits high recognition accuracy even on small sample data, making it suitable for named entity recognition tasks in the vegetable planting domain.

Key words: vegetable cultivation, named entity recognition, a lite BERT(ALBERT), adversarial training, multi-head self-attention