计算机工程与应用 ›› 2022, Vol. 58 ›› Issue (7): 220-228.DOI: 10.3778/j.issn.1002-8331.2106-0334

• 图形图像处理 • 上一篇    下一篇

融合MobileNetv2和注意力机制的轻量级人像分割算法

王欣,王美丽,边党伟   

  1. 1.西北农林科技大学 信息工程学院,陕西 咸阳 712100
    2.农业农村部农业物联网重点实验室,陕西 咸阳 712100
    3.陕西省农业信息与智能服务重点实验室,陕西 咸阳 712100
    4.西北机电工程研究所,陕西 咸阳 712100
  • 出版日期:2022-04-01 发布日期:2022-04-01

Algorithm for Portrait Segmentation Combined with MobileNetv2 and Attention Mechanism

WANG Xin, WANG Meili, BIAN Dangwei   

  1. 1.College of Information Engineering, Northwest A&F University, Xianyang, Shannxi 712100, China
    2.Key Laboratory of Agricultural Internet of Things, Ministry of Agriculture and Rural Affairs, Xianyang, Shannxi 712100, China
    3.Shaanxi Key Laboratory of Agricultural Information Perception and Intelligent Service, Xianyang, Shannxi 712100, China
    4.Northwest Institute of Mechanical & Electrical Engineering, Xianyang, Shannxi 712100, China
  • Online:2022-04-01 Published:2022-04-01

摘要: 针对人像分割精度不高、效率不佳的问题,提出一种融合MobileNetv2和注意力机制的轻量级人像分割算法,以实现对人像半身图进行分割。在编码器-解码器的U型网络结构的基础上,通过将MobileNetv2作为骨干网络,精简上采样过程,有效地减少了网络的参数量,有助于网络的迁移和训练。融合注意力机制的网络结构可更有效地学习人像特征,同时引进混合损失函数,有利于人像边缘像素点分类。该网络结构可选用人像半身图作为输入,并输出对应的图像掩膜。在Human_Matting和EG1800公开数据集上进行了实验,结果表明该算法精度分别达98.3%(Matting)、97.8%(EG1800),相较于PortraitNet预测96.3%(Matting)、95.8%(EG1800)的准确度和DeepLabv3+网络的96.8%(Matting)、96.4%(EG1800)准确度有明显提升,可以清晰地将目标人物和背景分离开。算法IOU指标可达98.6%(Matting)、98.2%(EG1800),在实验平台上分割测试集每张图片平均时间约0.015?s,可应用于轻量化场景中,为场景人像分割提供新的理论基础和研究思路。

关键词: 人像分割, MobileNetv2, 编码器-解码器, 注意力机制, 混合损失函数

Abstract: As for low precision and efficiency in portrait segmentation, an algorithm for portrait segmentation combined with MobileNetv2 and attention mechanism is proposed to achieve the portrait segmentation. With keeping the encoder-decoder of U-typed network , MobileNetv2 is used as the backbone of the network and streamline the upsampling process, it can reduce the parameters of the network. It is helpful for transfer and network training. The network with attention mechanism can learn portrait features more effectively, and the mixed loss is beneficial to the classification of difficult pixels of portrait edges. A portrait bust can be selected as the input of the model, and the corresponding image mask can be produced by the network. The proposed algorithm is tested on Human_Matting dataset and EG1800 dataset. The results show that the accuracy of the proposed algorithm is 98.3%(Matting) and 97.8%(EG1800), which is higher than PortraitNet(96.3%(Matting) and 95.8%(EG1800)) and DeepLabv3+(96.8%(Matting) and 96.4%(EG1800)). The algorithm can clearly separate the target person from the background. The proposed algorithm’s IOU can reach to 98.6%(Matting) and 98.2%(EG1800), which can be used in lightweight applications and provides a new research idea for portrait segmentation.

Key words: portrait segmentation, MobileNetv2, encoder-decoder, attention mechanism, mixed loss