计算机工程与应用 ›› 2008, Vol. 44 ›› Issue (28): 22-24.DOI: 10.3778/j.issn.1002-8331.2008.28.007

• 博士论坛 • 上一篇    下一篇

真实语料下基于多Agent的分布式英语语块识别

梁颖红1,2,曹 军2,赵铁军3   

  1. 1.苏州市职业大学 计算机学院,江苏 苏州 215104
    2.东北林业大学 信息与计算机工程学院,哈尔滨 150040
    3.哈尔滨工业大学 计算机科学与技术学院,哈尔滨 150001
  • 收稿日期:2008-06-18 修回日期:2008-07-11 出版日期:2008-10-01 发布日期:2008-10-01
  • 通讯作者: 梁颖红

Multi-agent distributed English chunking on real public corpus

LIANG Ying-hong1,2,CAO Jun2,ZHAO Tie-jun3   

  1. 1.School of Computer Engineering in Vocational University of Suzhou City,Suzhou,Jiangsu 215104,China
    2.School of Information and Computer Engineering in North East Forestry University,Harbin 150040,China
    3.School of Computer Science and Technology,Harbin Institute of Technology,Harbin 150001,China
  • Received:2008-06-18 Revised:2008-07-11 Online:2008-10-01 Published:2008-10-01
  • Contact: LIANG Ying-hong

摘要: 为了能比较不同方法的性能,常常希望在公共的训练集和测试集上进行语块识别。但是,用于实验的公共训练集和测试集往往规模较小而且具有领域的局限性。因而,在跨领域的真实语料情况下,语块识别的精确率有很大的下降。采用真实开放语料,设计多组实验研究不同的词性标注结果、不同领域的语料和不同的知识库对语块识别的影响,考察基于多Agent结构的分布式英语语块识别策略在实际系统中应用的可能性。实验表明,基于多Agent结构的分布式英语语块识别策略在真实开放语料下F测度达到了92%,基本能够满足实际应用的需要。

Abstract: Public corpus is often used to do research in order to compare the performance of different method.But the public corpus is only for experimentation,so its size is usually small and the field of public corpus is local.So the veracity of chunking descends on real different field corpus.Several experiments are designed to study the influence to chunking with different result of part of speech,different field corpus and different repository in this paper.The feasibility of distributed multi-agent English chunking strategy used to real application system is reviewed.Through testing on the real public corpus,F score of English chunking using multi-agent model achieves to 92%,which almost satisfies the practical need.