Computer Engineering and Applications ›› 2023, Vol. 59 ›› Issue (3): 13-22.DOI: 10.3778/j.issn.1002-8331.2207-0179

• Research Hotspots and Reviews • Previous Articles     Next Articles

Review of Data Normalization Methods

YANG Hanyu, ZHAO Xiaoyong, WANG Lei   

  1. 1.School of Information & Management, Beijing Information Science & Technology University, Beijing 100129, China
    2.Beijing Advanced Innovation Center for Materials Genome Engineering, Beijing Information Science & Technology University, Beijing 100129, China
  • Online:2023-02-01 Published:2023-02-01



  1. 1.北京信息科技大学 信息管理学院,北京 100129
    2.北京信息科技大学 北京材料基因工程高精尖创新中心,北京 100129

Abstract: In recent years, artificial intelligence has been widely used in various fields and has achieved remarkable results. Data normalization is a significant part of the implementation of artificial intelligence applications, which helps avoid incorrect modeling of data by neural networks due to the complexity of data dimensions. In the big data scenario, a portion of the data arrives at the training points successively in the form of streams. As a result, the research on data normalization in the stream scenario is a core problem that needs to be solved urgently. Currently, there are many reviews on normalization research, most of which only focus on the normalization research of batch data, but lack a summary of normalization methods for stream data, which is not informative. This paper systematically and exhaustively analyzes the literature on stream data normalization based on batch data normalization, condenses and proposes a normalization classification method based on stream data, and classifies the data normalization methods into batch data normalization methods and stream data normalization methods. At the same time, this paper compares and analyzes the principles, advantages, and main problems that can be solved by these methods. Finally, the future research directions of data normalization in different scenarios are prospected.

Key words: normalization, data stream, deep learning, data mining

摘要: 当今,人工智能已经广泛应用到各个领域中,并取得了显著的效果。数据归一化是人工智能应用落地中的一个重要环节,它有助于避免神经网络因数据量纲的复杂性对数据进行错误建模。在大数据场景下,相当一部分数据是以流的形式先后到达训练点,所以在流场景下数据归一化研究是当前亟待解决的关键问题。目前关于归一化研究的综述较多,大多仅仅针对于批数据的归一化研究,而缺乏对流数据的归一化方法的总结,不具参考性。在批数据归一化研究基础之上,系统化整理并详尽分析了流数据归一化的相关文献,凝练提出了基于流数据的归一化分类方法,并将数据归一化方法划分为批数据的归一化方法和流数据的归一化方法。同时,对这些方法的原理、优势和可以解决的主要问题进行了对比分析,在不同场景下对数据归一化的未来研究方向进行了展望。

关键词: 归一化, 数据流, 深度学习, 数据挖掘