计算机工程与应用 ›› 2015, Vol. 51 ›› Issue (23): 143-148.

• 数据库、数据挖掘、机器学习 • 上一篇    下一篇

基于增强约束条件随机场的Web对象信息抽取

黄彦姣,吴  秦,梁久祯   

  1. 江南大学 物联网工程学院,江苏 无锡 214122
  • 出版日期:2015-12-01 发布日期:2015-12-14

Boosted constrained conditional random fields for Web object information extraction

HUANG Yanjiao, WU Qin, LIANG Jiuzhen   

  1. School of Internet of Things Engineering, Jiangnan University, Wuxi, Jiangsu 214122, China
  • Online:2015-12-01 Published:2015-12-14

摘要: 线性链条件随机场模型难以处理Web对象与各个标注属性之间的特征关系,为解决此问题,提出一种增强约束条件随机场模型。通过将约束条件引入推理过程,改进线性链条件随机场模型的Viterbi算法;运用最大间隔理论的思想训练条件随机场模型,提高模型标注的正确率;将该模型与条件随机场模型及层次条件随机场模型进行对比。实验结果表明该模型能在提高标注正确率的基础上有效地解决Web对象信息抽取问题。

关键词: 增强约束条件随机场, 条件随机场, 属性标注, Web对象, 信息抽取

Abstract: Liner conditional random fields model is difficult to handle the relationship between Web data objects and characteristic of labeling attributes. To solve this problem, an improved sequence labeling model named Boosted Constrained Conditional Random Fields (BCCRFs) is proposed. Confidence constraint is introduced to the reasoning procedure by improved Viterbi procedures in the liner conditional random fields model. And the theory of large margin is applied to the random conditional fields model to improve the labeling accuracy. The proposed model is compared with the conditional random fields and hierarchical conditional random fields. The experimental results show that the BCCRFs model is effective on Web object information extraction and improves labeling accuracy.

Key words: boosted constrained conditional random fields, conditional random fields, attribute label, Web object, information extraction