计算机工程与应用 ›› 2009, Vol. 45 ›› Issue (5): 138-140.DOI: 10.3778/j.issn.1002-8331.2009.05.040

• 数据库、信号与信息处理 • 上一篇    下一篇

机器可读词典中词汇属性信息的获取

宋孜攀,陆汝占   

  1. 上海交通大学 计算机科学与工程系,上海 200240
  • 收稿日期:2008-08-05 修回日期:2008-10-23 出版日期:2009-02-11 发布日期:2009-02-11
  • 通讯作者: 宋孜攀

Acquisition of lexicon attribute information in machine readable dictionary

SONG Zi-pan,LU Ru-zhan   

  1. Department of Computer Science and Engineering,Shanghai Jiaotong University,Shanghai 200240,China
  • Received:2008-08-05 Revised:2008-10-23 Online:2009-02-11 Published:2009-02-11
  • Contact: SONG Zi-pan

摘要: 获取概念的属性信息有助于构建概念间的关系,进而改进基于概念的信息检索等应用的性能。研究了如何从机器可读词典中获取释义项的属性信息并实现了一个相应的系统DAE(Dictionary Attribute Extractor)。系统基于bootstrapping思想,进行模板-元组迭代抽取。在模板的获取中,引入了基于生物信息学多序列比对的方法;模板泛化时,引入词汇语义相似度计算和同义词扩展,提高模板覆盖率。实验中,系统抽取了“功能”、“颜色”和“组成”三种属性,取得了较好的效果。

关键词: 信息抽取, 自举, 序列比对, 语义相似度

Abstract: Acquisition of the attribute information of concepts can help in constructing relationships among them and further improving performance of applications such as information retrieval based on concept.This paper addresses on how to extract attribute information from a machine readable dictionary.A system named DAE(Dictionary Attribute Extractor) is also implemented.DAE,which is based on bootstrapping,extracts patterns and tuples iteratively.In extraction of patterns,it makes use of sequence alignment of bioinformatics.When generalizing patterns,semantic similarity of words and synonym extension are used to make the patterns more selective.In experiment,the system extracts attributes of functionality,color and composition which give good results.

Key words: information extraction, bootstrapping, sequence alignment, semantic similarity