Implementation of rule induction-based information extraction system

doi:10.3778/j.issn.1002-8331.2008.21.046

Computer Engineering and Applications ›› 2008, Vol. 44 ›› Issue (21): 166-170.DOI: 10.3778/j.issn.1002-8331.2008.21.046

• 机器学习 • Previous Articles Next Articles

Implementation of rule induction-based information extraction system

SHI Qian¹,CHEN Rong^1,2,LU Ming-yu¹

1.School of Informational Science and Technology，Dalian Maritime University，Dalian，Liaoning 116026，China
2.Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education，Jilin University，Changchun 130012，China

Received:2008-04-30 Revised:2008-05-26 Online:2008-07-21 Published:2008-07-21
Contact: SHI Qian

基于规则归纳的信息抽取系统实现

石倩¹,陈荣^1,2,鲁明羽¹

1.大连海事大学信息科学技术学院，辽宁大连 116026
2.吉林大学符号计算与知识工程教育部重点实验室，长春 130012

通讯作者: 石倩

Abstract

Abstract: With the rapid increase of Web information，Information Extraction （IE） techniques are good for automatically extracting data of interest from a mass of Web documents.In this paper，the design and the implementation of a rule induction based IE system is presented for automating Web information retrieval by DOM parsing and rules for retrieval，extraction and mapping.In this framework for rule induction，the authors particularly focus on the experiments with the WHISK algorithm for generating patterns.Experimental results show that the system performs well on both single-slot and multi-slot extraction tasks.

Key words: information extraction, extraction rule, DOM, learning algorithm

摘要： 面对Web信息的迅猛增长，信息抽取技术非常适合于从大量的文档中抽取需要的事实数据。通过文档对象模型（DOM）解析以及检索、抽取、映射等规则的定义，设计并实现了一种具有规则归纳能力的信息抽取系统，用于Web信息的自动检索。在用于抽取规则归纳的框架下，还重点对用于生成抽取模式的WHISK学习算法进行了实验对比分析，结果表明系统对于单槽和多槽数据都具有不错的归纳学习能力。

关键词: 信息抽取, 抽取规则, DOM, 学习算法

SHI Qian¹,CHEN Rong^1,2,LU Ming-yu¹. Implementation of rule induction-based information extraction system[J]. Computer Engineering and Applications, 2008, 44(21): 166-170.

石倩¹,陈荣^1,2,鲁明羽¹

. 基于规则归纳的信息抽取系统实现[J]. 计算机工程与应用, 2008, 44(21): 166-170.

[1]	MOU Qingping, ZHANG Ying, ZHANG Dongbo, WANG Xinjie, YANG Zhiqiao. Research on Visual Tracking Algorithm and Application of Target Loss Discrimination Mechanism [J]. Computer Engineering and Applications, 2021, 57(9): 140-147.
[2]	SHENG Jianhui, FAN Yinting, XIA Minjie. Physical Layer Security Performance Analysis for Wireless Sensor Network [J]. Computer Engineering and Applications, 2021, 57(8): 91-95.
[3]	WANG Youfa, ZHOU Yuanyuan, LUO Jianqiang. Analysis of Hotspots and Progress in Intelligent Manufacturing in Recent 20 Years [J]. Computer Engineering and Applications, 2021, 57(6): 49-57.
[4]	YANG Yemin, ZHANG Huijun, ZHANG Xiaolong. Research on Interpretable Visual Analysis Method of Random Forest [J]. Computer Engineering and Applications, 2021, 57(6): 168-175.
[5]	XU Jianguo, LIU Yonghui, LIU Mengfan. Research on Semantic Role Labeling of University Policy Based on BILSTM-CRF [J]. Computer Engineering and Applications, 2021, 57(6): 207-211.
[6]	YU Duo, HUANG Yongdong. Hyperspectral Image Classification Based on SPCA and Domain Transform Recursive Filtering [J]. Computer Engineering and Applications, 2021, 57(4): 199-208.
[7]	GU Meihua, WANG Miaomiao, LI Liyao, FENG Jing. Color Image Multi-scale Fusion Graying Algorithm [J]. Computer Engineering and Applications, 2021, 57(4): 209-215.
[8]	YANG Wei, WU Yingying, WANG Ting. Research on Configuration Optimization Problems of Shuttle-Carrier Storage and Retrieval System [J]. Computer Engineering and Applications, 2021, 57(4): 258-265.
[9]	XIONG Jian, QIN Renchao, HE Mengyi, LIU Jianlan, TANG Fengyang. Application of Improved Random Forest Algorithm in Android Malware Detection [J]. Computer Engineering and Applications, 2021, 57(3): 130-136.
[10]	HU Jie, ZHANG Ying, XIE Shiyi. Summary of Research Progress on Application of Domestic Remote Sensing Image Classification Technology [J]. Computer Engineering and Applications, 2021, 57(3): 1-13.
[11]	LI Zhuangkuo, CHANG Kaixuan. Ant Colony Optimization for Continuous Domains Applied to Cooperative Game [J]. Computer Engineering and Applications, 2021, 57(24): 198-204.
[12]	LI Qian, JIANG Li, LIANG Changyong. Multi-objective Cold Chain Distribution Optimization Based on Fuzzy Time Window [J]. Computer Engineering and Applications, 2021, 57(23): 255-262.
[13]	WEI Hao, ZHOU Ai, ZHANG Yijia, CHEN Fei, QU Wen, LU Mingyu. Review of Deep Learning-Based Biomedical Entity Relation Extraction Research [J]. Computer Engineering and Applications, 2021, 57(21): 14-23.
[14]	AN Lei, HAN Zhonghua, LIN Shuo, SHANG Wenli. Research on GAN-SDAE-RF Model for Network Intrusion Detection [J]. Computer Engineering and Applications, 2021, 57(21): 155-164.
[15]	ZHAO Pengfei, LI Yanling, LIN Min. Intent Detection of Domain Adaptation Combined with Capsule Network [J]. Computer Engineering and Applications, 2021, 57(21): 188-194.

Implementation of rule induction-based information extraction system

基于规则归纳的信息抽取系统实现

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics