基于卷积词袋网络的视觉识别

计算机工程与应用 ›› 2016, Vol. 52 ›› Issue (21): 180-187.

基于卷积词袋网络的视觉识别

薛昆南，薛月菊，毛亮，刘洪山

华南农业大学电子工程学院，广州 510642

出版日期:2016-11-01 发布日期:2016-11-17

Bag of convolutional words networks for visual recognition

XUE Kunnan, XUE Yueju, MAO Liang, LIU Hongshan

School of Electronic Engineering, South China Agricultural University, Guangzhou 510642, China

Online:2016-11-01 Published:2016-11-17

摘要/Abstract

摘要： 近年来，卷积神经网络（CNN）凭借其强大的特征学习能力在视觉识别领域取得重要进展。针对CNN全连接层对图像平移、旋转、缩放等变换比较敏感的问题，提出了一种混合模型——卷积词袋网络（BoCW-Net）。它将BoW模型嵌入CNN结构中并代替全连接层，通过端到端的方式学习特征、字典和分类器。为实现BoCW-Net整个网络的有监督学习，提出基于方向相似度的BoCW编码。同时，为充分利用中层特征和高层特征的鉴别性，将中层辅助分类器与高层分类器集成，形成主-辅集成分类器。实验结果表明：相比全连接层，BoCW表示对各种变换具有更强的不变性;主-辅集成分类器能有效融合中层、高层特征，提高BoCW-Net的识别性能;相比新近发展的CNN模型，BoCW-Net在CIFAR-10、CIFAR-100和MNIST数据库上均取得了改进的识别性能，最终分别获得4.88%、22.48%和0.21%的测试错误率。

关键词: 卷积神经网络, 卷积词袋（BoCW）表示, 主-辅集成分类器

Abstract: In recent years, Convolutional Neural Networks（CNN） have made a progress in visual recognition tasks with its powerful feature learning ability. A hybrid model called BoCW-Net is proposed to solve the problem that full-connection layer in CNN is more sensitive to image’s transformations such as translation, rotation and scale, et al. It embeds BoW model into CNN architectures and replaces the full-connection layer, while it can learn feature, dictionary and classifier in the end-to-end way. In order to realize supervised learning of whole BoCW-Net, BoCW encoding based on direction similarity is proposed. In the meanwhile, to take full advantage of the discrimination of both mid-level and high-level features, middle-level auxiliary classifier is integrated to high-level classifier to form the main-auxiliary ensemble classifier. Experimental results show that BoW model imbedded into CNN has better invariance for a variety of transformations compared with the full-connection layer. Main-auxiliary ensemble classifier can effectively fusion mid-level and high-level features to improve the recognition performance of BoCW-Net. Compared with the newly developed CNN models, BoCW-Net acquires improved recognition performance on CIFAR-10、CIFAR-100 and MNIST dataset with 4.88%, 22.48% and 0.21% final test error rate, respectively.

Key words: convolutional neural networks, Bag of Convolutional Words（BoCW） representation, main-auxiliary ensemble classifier

薛昆南，薛月菊，毛亮，刘洪山. 基于卷积词袋网络的视觉识别[J]. 计算机工程与应用, 2016, 52(21): 180-187.

XUE Kunnan, XUE Yueju, MAO Liang, LIU Hongshan. Bag of convolutional words networks for visual recognition[J]. Computer Engineering and Applications, 2016, 52(21): 180-187.

[1]	冉蓉，徐兴华，邱少华，崔小鹏，欧阳斌. 基于深度卷积神经网络的裂纹检测方法综述[J]. 计算机工程与应用, 2021, 57(9): 23-35.
[2]	牟清萍，张莹，张东波，王新杰，杨知桥. 目标丢失判别机制的视觉跟踪算法及应用研究[J]. 计算机工程与应用, 2021, 57(9): 140-147.
[3]	包志强，邢瑜，吕少卿，黄琼丹. 改进YOLO V2的6D目标姿态估计算法[J]. 计算机工程与应用, 2021, 57(9): 148-153.
[4]	赵志焱，杨华，胡志伟，宇海萍. 基于TACNN的玉露香梨叶虫害识别[J]. 计算机工程与应用, 2021, 57(9): 176-181.
[5]	周伦钢，孙怡峰，王坤，吴疆，黄维贵，李炳龙. 目标多种多值属性的端端快速识别网络[J]. 计算机工程与应用, 2021, 57(9): 182-190.
[6]	张成，戴俊峰，熊闻心. 融合LeNet-5改进的扫描文档手写日期识别[J]. 计算机工程与应用, 2021, 57(9): 207-211.
[7]	麻哲旭，杨峰，乔旭. 铁路路基病害智能检测方法[J]. 计算机工程与应用, 2021, 57(9): 272-278.
[8]	张越，黄友锐，刘鹏坤. 引入注意力机制的多分辨率人体姿态估计研究[J]. 计算机工程与应用, 2021, 57(8): 126-132.
[9]	梁芳烜，杨锋，卢丽云，尹梦晓. 基于卷积神经网络的脑肿瘤分割方法综述[J]. 计算机工程与应用, 2021, 57(7): 34-43.
[10]	杨培伟，周余红，邢岗，田智强，许夏瑜. 卷积神经网络在生物医学图像上的应用进展[J]. 计算机工程与应用, 2021, 57(7): 44-58.
[11]	常昊，陈晓雷，张爱华，李策，林冬梅. 嵌入改进SENet的卷积神经网络连续血压预测[J]. 计算机工程与应用, 2021, 57(7): 130-135.
[12]	李现国，冯欣欣，李建雄. 多尺度残差网络的单幅图像超分辨率重建[J]. 计算机工程与应用, 2021, 57(7): 215-221.
[13]	黄金杰，蔺江全，何勇军，何瑾洁，王雅君. 局部语义与上下文关系的中文短文本分类算法[J]. 计算机工程与应用, 2021, 57(6): 94-100.
[14]	贺钰博，刘坤. 基于卷积神经网络的海面显著性目标检测[J]. 计算机工程与应用, 2021, 57(6): 108-116.
[15]	张良，张增，舒伟华，梅魁志. 基于YOLOv3的卷积层结构化剪枝[J]. 计算机工程与应用, 2021, 57(6): 131-137.