计算机工程与应用 ›› 2012, Vol. 48 ›› Issue (14): 121-129.

• 数据库、信号与信息处理 • 上一篇    下一篇

中文数据清洗研究综述

叶  鸥1,张  璟1,李军怀2   

  1. 西安理工大学 计算机科学与技术学院,西安 710048
  • 出版日期:2012-05-11 发布日期:2012-05-14

Survey of Chinese data cleaning

YE Ou1, ZHANG Jing1, LI Junhuai2   

  1. School of Computer Science & Engineering, Xi’an University of Technology, Xi’an 710048, China
  • Online:2012-05-11 Published:2012-05-14

摘要: 针对中文数据清洗研究进行了综述。阐明了全面数据质量管理与数据清洗之间的关系,给出数据清洗的定义及对象;介绍中文数据清洗问题产生的背景、国内外研究现状与研究热点,并简介其基本原理、模型及已有算法;着重阐明了中文数据清洗的方法;总结中文数据清洗研究的不足,并对中文数据清洗的研究及应用进行了展望。

关键词: 中文数据清洗, 数据质量管理, 数据集成

Abstract: Chinese data cleaning problem is surveyed in this paper. The relationships among total data quality management and data cleaning are clarified, and the definition and objects of data cleaning are given. The background of data cleaning problem, research status and hot research areas are introduced, and the basic principle and some models of data cleaning are presented briefly, existing algorithms are analyzed. According to the situation of the country and demand of projects, the methods of Chinese data cleaning are emphasized. The weakness of Chinese data cleaning is clarified, and the future research topics and application related to Chinese data cleaning problem are discussed.

Key words: Chinese data cleaning, data quality management, data integration