计算机工程与应用 ›› 2020, Vol. 56 ›› Issue (2): 171-175.DOI: 10.3778/j.issn.1002-8331.1811-0241

• 模式识别与人工智能 • 上一篇    下一篇

基于上下文特征融合的行为识别算法

祁大健,杜慧敏,张霞,常立博   

  1. 西安邮电大学 电子工程学院,西安 710121
  • 出版日期:2020-01-15 发布日期:2020-01-14

Behavior Recognition Algorithm Based on Context Feature Fusion

QI Dajian, DU Huimin, ZHANG Xia, CHANG Libo   

  1. School of Electronic Engineering, Xi’an University of Posts and Telecommunication, Xi’an 710121, China
  • Online:2020-01-15 Published:2020-01-14

摘要: 针对LSTM网络无法充分提取短时信息导致人体行为识别率不高的问题,提出一种基于上下文特征融合的卷积长短时记忆网络联合优化架构,用于仅具有RGB数据的行为识别网络。使用3D卷积核对输入的动作序列提取其空间特征和短时时间特征,并将多通道信息进行融合,将融合后的特征送入下一级卷积神经网络和LSTM层中进行长期时间的特征学习,获取上下文的长期时空信息,最后用Softmax分类器进行人体行为的分类。实验结果表明,在人体行为识别公开数据集UCF-101上,提出的基于上下文特征融合的卷积长短时记忆网络的平均识别准确率达93.62%,相比于未进行特征融合的卷积长短时记忆网络提高了1.28%,且平均检测时间降低了37.1%。

关键词: 行为识别, 深度学习, 卷积神经网络, 长短时记忆网络(LSTM), 上下文特征融合

Abstract: According to the human visual recognition mechanism, it is a feasible method to use a convolutional network to extract features to detect the appearance of an object and to detect motion using LSTM. This paper proposes a joint optimization architecture for convolutional long-term memory networks based on context feature fusion for behavior recognition networks with only RGB data. The 3D convolution kernel is used to extract the spatial features and short-term time features of the input action sequence, and the multi-channel information is fused, and the fused features are sent to the next-level convolutional neural network and the LSTM layer for long-term time characteristics learning, obtaining long-term spatial and temporal information of the context, and finally classifying human behavior using the Softmax classifier. The experimental results show that the average recognition accuracy of the convolutional long-term memory network based on context feature fusion is 93.62%, which is compared with the convolution length without feature fusion. The time-memory network is increased by 1.28% and the average detection time is reduced by 37.1%.

Key words: behavior recognition, deep learning, convolutional neural network, Long Short-Term Memory Network(LSTM), contextual feature fusion