Computer Engineering and Applications ›› 2012, Vol. 48 ›› Issue (21): 191-196.

Previous Articles     Next Articles

Image representation based on visual vocabulary shape description

WANG Hongxia1, YANG Kejian1, ZHANG Min2, AI Haojun2, CHEN Xianqiao1   

  1. 1.School of Computer Science & Technology, Wuhan University of Technology, Wuhan 430063, China
    2.School of Computer, Wuhan University, Wuhan 430072, China
  • Online:2012-07-21 Published:2014-05-19

基于视觉词汇形状描述的图像表示方法

王红霞1,杨克俭1,张  敏2,艾浩军2,陈先桥1   

  1. 1.武汉理工大学 计算机科学与技术学院,武汉 430063
    2.武汉大学 计算机学院,武汉 430072

Abstract: The Spatial Pyramid Matching(SPM) approach, which is based on approximate global geometric correspondence, disregards invariance to translation, scale and rotation of visual objects in images. This paper proposes an image representation method based on visual vocabulary shape description model. According to this method, spatial geometric model relative to the geometric center of each visual word is constructed to guarantee translation invariance; this paper presents log polar spatial pyramid matching, log polar radius is normalized and a consistent orientation to visual word is assigned in order to achieve scaling and rotation invariance. Experiments have been conducted for comparing and evaluating the proposed method utilizing the Caltech-101 dataset and this paper’s dataset. Experimental results show that the proposed method improves the classification accuracy, especially for the dataset containing images with obvious translation, scaling and rotation changes, and is more robust because of its smaller variance.

Key words: object categorization, bag-of-visual-words, image representation, spatial pyramid matching, visual vocabulary shape description model

摘要: 针对目前图像表示中引入空间位置信息的空间金字塔匹配方法缺乏对图像中视觉物体平移、缩放和旋转的考虑,提出一种基于视觉词汇形状描述模型的图像表示方法。该方法相对于每个视觉单词的几何中心建立空间几何模型,保证平移不变性;给出对数极坐标空间金字塔匹配,对对数极半径做归一化,保证缩放不变性;在空间金字塔划分过程中确定极角的主方向,从而保证旋转不变性。分别在Caltech-101数据集和自建图像数据集上对该方法进行了验证和比较。实验结果表明,该方法提高了分类识别准确率,特别是对于包含明显平移、缩放和旋转变化的图像数据集;该方法的方差较小,说明其鲁棒性更强。

关键词: 物体分类, 视觉词袋模型, 图像表示, 空间金字塔匹配, 视觉词汇形状描述模型