计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (20): 218-227.DOI: 10.3778/j.issn.1002-8331.2504-0324

• 模式识别与人工智能 • 上一篇    下一篇

面向对象语义线索的无监督语义分割研究

贺祺祥,郭红钰,陈启志,刘玉龙   

  1. 中国电子科技集团公司 第十五研究所 系统八部,北京 100083
  • 出版日期:2025-10-15 发布日期:2025-10-15

Unsupervised Semantic Segmentation Based on Object-Aware Semantic Cues

HE Qixiang, GUO Hongyu, CHEN Qizhi, LIU Yulong   

  1. Department of System 8, The 15th Research Institute of China Electronics Technology Group Corporation, Beijing 100083, China
  • Online:2025-10-15 Published:2025-10-15

摘要: 在传统的语义分割任务中,广泛依赖像素级标注数据,促使无监督方法逐渐受到关注。近年来,自监督视觉Transformer的深层特征被广泛应用,推动了无监督语义分割的研究进展。然而,由于局部特征编码缺乏显式的对象级语义表示,复杂结构物体的分割仍面临挑战,常导致分割效果不理想。为解决这一问题,提出了一种名为OASES(object-aware segmentation system)的新型无监督语义分割框架,旨在强化面向对象的表示学习。该方法融合了谱分析过程,通过分析深度图像特征的语义相似性矩阵和图像颜色亲和性中提取的特征值,获取语义和结构线索。此外,结合面向对象的对比损失,引导模型学习在图像内外保持一致的对象级语义表示,从而提升语义分割的准确性。在COCO-Stuff和Cityscapes数据集上的大量实验表明,OASES在复杂场景中实现了准确且一致的分割效果,达到了当前领先的无监督语义分割性能。

关键词: 无监督语义分割(USS), 对象级语义结构线索, 谱分析, 对比学习

Abstract: The reliance on extensive pixel-level annotations in traditional semantic segmentation has led to the exploration of unsupervised approaches. Recent advancements have leveraged the deep features of self-supervised vision Transformers, contributing to progress in unsupervised semantic segmentation (USS). However, segmenting complex objects remains challenging due to the lack of explicit object-level semantic representations in local feature encoding, resulting in inadequate segmentation for objects with intricate structures. To overcome this limitation, a novel USS framework named OASES (object-aware segmentation system) is introduced, focusing on enhancing object-centric representation learning. This method integrates a spectral analysis process, extracting semantic and structural insights by analyzing eigenvalues derived from the semantic similarity matrix of deep image features and the color affinity of images. Moreover, by incorporating an object-centric contrastive loss, the framework encourages the model to learn object-level representations that maintain consistency both within and across images, thereby improving semantic segmentation accuracy. Comprehensive experiments conducted on COCO-Stuff and Cityscapes datasets confirm that OASES achieves state-of-the-art segmentation performance, delivering accurate and consistent results across complex visual scenes.

Key words: unsupervised semantic segmentation(USS), object-level semantic cues, spectral analysis, contrastive learning