支持增量式更新的大数据特征学习模型

计算机工程与应用 ›› 2015, Vol. 51 ›› Issue (12): 21-26.

支持增量式更新的大数据特征学习模型

卜范玉1，2，陈志奎1，张清辰1

1.大连理工大学软件学院，辽宁大连 116620
2.内蒙古财经大学职业学院，呼和浩特 010010

出版日期:2015-06-15 发布日期:2015-06-30

Incremental updating method for big data feature learning

BU Fanyu1，2, CHEN Zhikui1, ZHANG Qingchen1

1.School of Software Technology, Dalian University of Technology, Dalian, Liaoning 116620, China
2.College of Vocation, Inner Mongolia University of Finance and Economics, Hohhot 010010, China

Online:2015-06-15 Published:2015-06-30

摘要/Abstract

摘要： 大数据具有高速变化特性，其内容与分布特征均处于动态变化之中，目前的前馈神经网络模型是一种静态学习模型，不支持增量式更新，难以实时学习动态变化的大数据特征。针对这个问题，提出一种支持增量式更新的大数据特征学习模型。通过设计一个优化目标函数对参数进行快速增量式更新，为了在更新过程中保持网络的原始知识，最小化平方误差函数。对于特征变化频繁的数据，通过增加隐藏层神经元数目网络对结构进行更新，使得更新后的网络能够实时学习动态变化大数据的特征。在对网络参数与结构更新之后，通过权重矩阵SVD分解对更新后的网络结构进行优化，删除冗余的网络连接，增强网络模型的泛化能力。实验结果表明提出的模型能够在尽可能保持网络模型原始知识的基础上，通过不断更新神经网络的参数与结构实时学习动态大数据的特征。

关键词: 大数据, 前馈神经网络, 增量式学习, 奇异值分解（SVD）

Abstract: Data are generating at extremely high speed in the era of big data, whose contents and features are in the dynamic changes. Thus, the learning algorithm for neural networks should not only be able to adapt new instances, but also preserve the prior knowledge. However, the feed-forward neural network trained by typically Back-Propagation（BP） algorithm is not incremental in nature. This paper proposes an incremental back-propagation model for training neural networks. The goal of incremental leaning is achieved by adjusting the parameters and structures of the feed-forward neural network. The parameters are incrementally adapted by optimizing an objective function. The network topology is adapted by increasing the number of hidden neurons only if the parameters adaption perturbs the prior knowledge severely. After updating the model, the Singular Value Decomposition（SVD） of the weight matrix is performed to remove the redundant connections of each newly added hidden unit. Experimental results demonstrate that the proposed model can adjust its parameters and structure depending on the requirement of the big data process in real time with preserving the prior knowledge as much as possible in evolving environments.

Key words: big data, feed-forward neural networks, incremental learning, Singular Value Decomposition（SVD）

卜范玉1，2，陈志奎1，张清辰1. 支持增量式更新的大数据特征学习模型[J]. 计算机工程与应用, 2015, 51(12): 21-26.

BU Fanyu1，2, CHEN Zhikui1, ZHANG Qingchen1. Incremental updating method for big data feature learning[J]. Computer Engineering and Applications, 2015, 51(12): 21-26.

[1]	吴昊，徐行健，孟繁军. 课程资源的融合知识图谱多任务特征推荐算法[J]. 计算机工程与应用, 2021, 57(21): 132-139.
[2]	吴东阳，窦建平，李俊. 四旋翼飞行器的数字孪生系统设计[J]. 计算机工程与应用, 2021, 57(16): 237-244.
[3]	刘桂枝. 维度变化的不完备混合型数据增量式属性约简[J]. 计算机工程与应用, 2021, 57(12): 161-169.
[4]	李凌，顾晓梅，刘子豪. 多子域随机森林在情境感知推荐中的应用研究[J]. 计算机工程与应用, 2020, 56(22): 132-141.
[5]	王永贵，郭昕彤. SparkSql上自适应数据集的高效频繁集挖掘算法[J]. 计算机工程与应用, 2020, 56(21): 72-78.
[6]	张萌，孙秉珍，楚晓丽. 基于邻域代价敏感三支决策的痛风诊断模型[J]. 计算机工程与应用, 2020, 56(16): 218-225.
[7]	邬阳阳，汤建国. 大数据背景下粗糙集属性约简研究进展[J]. 计算机工程与应用, 2019, 55(6): 31-38.
[8]	王静宇，栾俊清，谭跃生. 基于数据敏感性的大数据访问控制模型研究[J]. 计算机工程与应用, 2019, 55(23): 70-77.
[9]	曹浩，陈里里，司吉兵，任君兰. 奇异值分解和稀疏自编码器的轴承故障诊断[J]. 计算机工程与应用, 2019, 55(20): 257-262.
[10]	侯屿1，2，秦小林2，彭皓月1，2，张力戈1，2. 全局调距和声特征选择算法[J]. 计算机工程与应用, 2019, 55(2): 21-27.
[11]	王德贤，何先波，贺春林，周坤，陈敏治. 结合L1和L2正则化约束的隐语义预测模型研究[J]. 计算机工程与应用, 2019, 55(19): 121-127.
[12]	王渊，彭晨辉，王志强，范强，姚一杨，华召云. 知识图谱在电网全业务统一数据中心的应用[J]. 计算机工程与应用, 2019, 55(15): 104-109.
[13]	李宇帆1，张会福2，刘上力2，唐兵1. 教育数据挖掘研究进展[J]. 计算机工程与应用, 2019, 55(14): 15-23.
[14]	曹菁菁1，任欣欣2，徐贤浩2. 基于并行Apriori的物流路径频繁模式研究[J]. 计算机工程与应用, 2019, 55(11): 257-264.
[15]	康家兴，牛保宁，郝晋瑶. 多参数的城市时空热点查询[J]. 计算机工程与应用, 2019, 55(10): 233-239.