计算机工程与应用 ›› 2008, Vol. 44 ›› Issue (18): 132-135.

• 数据库、信号与信息处理 • 上一篇    下一篇

混合遗传算法和隐马尔可夫模型的Web信息抽取

肖基毅,邹腊梅,李传琦   

  1. 南华大学 计算机科学与技术学院,湖南 衡阳 421001
  • 收稿日期:2007-09-25 修回日期:2007-11-30 出版日期:2008-06-21 发布日期:2008-06-21
  • 通讯作者: 肖基毅

Hybrid genetic algorithm and hidden Markov model for Web information extraction

XIAO Ji-yi,ZOU La-mei,LI Chuan-qi   

  1. School of Computer Science and Technology,University of South China,Hengyang,Hunan 421001,China
  • Received:2007-09-25 Revised:2007-11-30 Online:2008-06-21 Published:2008-06-21
  • Contact: XIAO Ji-yi

摘要: 传统Web信息抽取的隐马尔可夫模型对初值十分敏感和在实际训练中极易得到局部最优模型参数。提出了一种使用遗传算法优化HMM模型参数的Web信息抽取混合算法。该算法使用实数矩阵编码表示染色体,似然概率值为适应度取值,将GA与Baum-Welch算法相结合对HMM模型参数进行全局优化,并且调整GA-HMM的Baum-Welch算法参数实现Web信息抽取。实验结果表明,新的算法在精确度和召回率指标上比传统HMM具有更好的性能。

Abstract: The traditional training method of HMM for Web information extraction is sensitive to the initial model parameters and easy to lead to a sub-optimal model in practice.A hybrid algorithm is proposed to optimize HMM parameters by using genetic algorithm for Web information extraction.The algorithm makes use real number matrix encoding as the representation of the chromosomes,the fitness values are the results of the likelihood values,combines GA and Baum-Welch algorithm to optimize HMM parameters globally,and then to adjust the Baum-Welch algorithm parameters in GA-HMM for Web information extraction.Experimental results show that the new algorithm improves the performance in precision and recall.