计算机工程与应用 ›› 2021, Vol. 57 ›› Issue (14): 27-38.DOI: 10.3778/j.issn.1002-8331.2101-0187

• 热点与综述 • 上一篇    下一篇

缺失数据处理方法研究综述

熊中敏,郭怀宇,吴月欣   

  1. 上海海洋大学 信息学院,上海 201306
  • 出版日期:2021-07-15 发布日期:2021-07-14

Review of Missing Data Processing Methods

XIONG Zhongmin, GUO Huaiyu, WU Yuexin   

  1. School of Information, Shanghai Ocean University, Shanghai 201306, China
  • Online:2021-07-15 Published:2021-07-14

摘要:

大数据时代,数据爆炸式的增长,数据获取变得更容易的同时数据缺失现象也更加普遍。数据的缺失极大地降低了数据的实用性。数据缺失问题的处理成为大数据处理的热点研究课题。介绍了数据缺失问题的研究意义和国内外研究现状。系统地分析了造成数据缺失的原因,对数据缺失问题进行了分类。对近年来国内外缺失数据处理方法进行了综述,总结了各自优缺点、适用范围、效果评价指标。重点阐述了回归填充、聚类填充等填充方法。对缺失数据处理方法领域进行了总结与展望。

关键词: 缺失数据, 缺失分类, 填充方法, 方法比较, 效果评价

Abstract:

In the era of big data, with the explosive growth of data, data acquisition has become easier and data missing has become more common. The lack of data greatly reduces the usefulness of the data and the handling of data missing has become a hot research topic in big data processing. The article first introduces the research significance of the problem of missing data and the current research status at home and abroad, then analyzes the reasons for the missing data systematically, classifies the problem of missing data, and reviews the methods of processing missing data at home and abroad in recent years, summarizes their respective advantages and disadvantages, scope of application, and effect evaluation indicators which focus on the regression filling, cluster filling and other filling methods. Finally, it summarizes and looks forward to the field of missing data processing methods.

Key words: missing data, missing classification, filling method, method comparison, effect evaluation