计算机工程与应用 ›› 2020, Vol. 56 ›› Issue (12): 134-140.DOI: 10.3778/j.issn.1002-8331.1903-0416

• 模式识别与人工智能 • 上一篇    下一篇

基于人脸分割的复杂环境下表情识别实时框架

吕诲,童倩倩,袁志勇   

  1. 武汉大学 计算机学院,武汉 430072
  • 出版日期:2020-06-15 发布日期:2020-06-09

Realtime Architecture for Facial Expression Recognitionin Complex Scenes Based on Face Region Segmentation

LV Hui, TONG Qianqian, YUAN Zhiyong   

  1. School of Computer Science, Wuhan University, Wuhan 430072, China
  • Online:2020-06-15 Published:2020-06-09

摘要:

复杂环境下人脸表情识别由于人脸姿势、遮挡及光照等因素影响,相较于可控环境下的人脸表情识别具有更高的挑战性。针对复杂环境下人脸表情识别精度低以及现阶段用于表情识别的网络结构复杂造成的识别效率低等问题,提出了一种基于人脸分割的复杂环境下表情识别实时框架。该框架包括用于人脸区域分割的FsNet(Face segmentation Network)和用于表情识别的TcNet(Tiny classification Network)。FsNet旨在分割出对表情识别最相关的人脸区域以提升TcNet识别精度,其训练数据集基于已有数据集构建。两个网络的结构设计均趋于精简化以保证整体框架的实时性需求。在FER-2013和RAD-DB两个复杂场景人脸表情数据库上的实验表明,人脸区域分割的方式有利于提高复杂环境下人脸表情的识别率,且整体框架在保证实时性的同时达到了良好的识别效果。

关键词: 表情识别, 人脸分割, 卷积神经网络, 实时

Abstract:

Facial Expression Recognition(FER)in complex scenes is more challenging than that in a controlled environment due to the influencing factors such as facial posture, occlusion and illumination. Aiming at increasing the low recognition accuracy of FER in complex scenes, as well as the low recognition efficiency caused by complicated network structure in the process of FER, a real-time architecture for FER in complex scenes based on face region segmentation is proposed. It is composed of a face segmentation network called FsNet(Face segmentation Network) and an expression recognition network called TcNet (Tiny classification Network). FsNet will segment the face area that is most relevant to FER to improve the recognition accuracy of TcNet, the training data applied in it is built from existing databases. The structures for both FsNet and TcNet are designed to be simple enough to ensure the real-time requirements for the architecture. Experiments on two natural expression databases(FER-2013 and RAD-DB) show that the face segmentation method is beneficial to improve the recognition effect of FER in complex scenes, and the overall architecture achieves good performance while guaranteeing real-time demand.

Key words: facial expression recognition, face segmentation, Convolutional Neural Network(CNN), real time