计算机工程与应用 ›› 2015, Vol. 51 ›› Issue (7): 112-117.

• 数据库、数据挖掘、机器学习 • 上一篇    下一篇

最大熵模型和BP神经网络的短句对齐比较

刘  颖,王  楠   

  1. 清华大学 中文系,北京 100084
  • 出版日期:2015-04-01 发布日期:2015-03-31

Comparison of clause alignment based on maximum entropy model and Back Propagation neural network model

LIU Ying, WANG Nan   

  1. Department of Chinese Language and Literature, Tsinghua University, Beijing 100084, China
  • Online:2015-04-01 Published:2015-03-31

摘要: 利用最大熵模型和BP神经网络对《史记》古文与现代文译文的平行语料进行短句对齐研究。最大熵模型将短句长度、短句对齐模式和共现汉字特征相结合来对平行语料进行短句对齐;BP神经网络则把短句长度、短句位置和共现汉字特征相结合来对平行语料进行短句对齐。实验结果表明:同时考虑短句长度、短句对齐模式和共现汉字3个特征的最大熵模型,短句对齐的准确率和召回率是最高的;并且最大熵模型的准确率和召回率高于BP神经网络。

关键词: 短句对齐, 最大熵模型, BP神经网络, 《史记》

Abstract: Clauses are aligned for Shi Ji ancient and modern parallel corpora using maximum entropy model and Back Propagation neural network model. Maximum entropy model combines clause length, clause alignment mode with co-occurring Chinese word feature. Back Propagation neural network model combines clause length, clause position with co-occurring Chinese word feature. The precision and the recall rate of clause alignment are highest when it uses the three features for maximum entropy model. The precision and the recall rate of maximum entropy model are higher than those of Back Propagation neural network model.

Key words: clause alignment, maximum entropy model, Back Propagation neural network model, Records of the Grand Historian(Shi Ji)