计算机工程与应用 ›› 2015, Vol. 51 ›› Issue (3): 117-123.

• 数据库、数据挖掘、机器学习 • 上一篇    下一篇

基于抽样的Deep Web模式匹配框架

袁  淼,王  鑫   

  1. 合肥工业大学 计算机与信息学院,合肥 230009
  • 出版日期:2015-02-01 发布日期:2015-01-28

Deep Web schema matching frame based on sampling

YUAN Miao, WANG Xin   

  1. School of Computer and Information, Hefei University of Technology, Hefei 230009, China
  • Online:2015-02-01 Published:2015-01-28

摘要: 针对DCM(Dual Correlation Mining)框架匹配特殊模式集时查准率低下的缺陷,借鉴机器学习领域中的bagging方法,提出一种基于抽样的Deep Web模式匹配框架。该框架随机在模式集中抽取多个子模式集,分别对子模式集进行复杂匹配,集成各个子模式集的匹配结果,在整体上提高匹配的查准率。分析与实验证明该框架在处理特殊模式集时,平均能提高查准率41.2%。

关键词: Deep Web, 模式匹配, 相关性挖掘, 抽样

Abstract: The dual correlation mining frame has a low precision when some special schemas are in the set. Inspired by bagging algorithm in machine learning, a schema matching frame based on sampling is proposed. The frame randomly sample several subsets form input schemas, then execute the DCM matcher on each subset. The frame will  achieve a robust matching accuracy by synthesizing the results of each subset. Experimental results show that the precision is increased by 41.2% in average.

Key words: Deep Web, schema matching, dual correlation mining, sampling