Generalized Zero-shot Learning with Multi-source Semantic Embeddings for Scene Recognition

Xinhang Song, Haitao Zeng, Sixian Zhang, Luis Herranz, Shuqiang Jiang
(ACMMM 2020)
[PDF]

本文面向更具复杂性的场景数据开展研究,提出了一种特征生成式零样本学习框架,主要创新点包括:1)多源语义描述融合的零样本学习;2)基于局部区域描述的场景描述增强。为了生成未知类视觉特征,我们提出了一种二步式生成框架,局部语义描述首先采样生成虚拟样本,再生成局部视觉特征并融合为全局特征。最后,生成的未知类的视觉特征与已知类的提取特征合并,共同训练联合分类器。为了验证本文方法,我们提出了一个新的具有多种语义描述的数据集,实验结果表明本文所提出框架在SUN Attitude和本文所提出数据集上均达到了最优结果。

Abstract

Recognizing visual categories from semantic descriptions is a promising way to extend the capability of a visual classifier beyond the concepts represented in the training data (i.e. seen categories). This problem is addressed by (generalized) zero-shot learning methods (GZSL), which leverage semantic descriptions that connect them to seen categories (e.g. label embedding, attributes). Conventional GZSL are designed mostly for object recognition. In this paper we focus on zero-shot scene recognition, a more challenging setting with hundreds of categories where their differences can be subtle and often localized in certain objects or regions. Conventional GZSL representations are not rich enough to capture these local discriminative differences. Addressing these limitations, we propose a feature generation framework with two novel components: 1) multiple sources of semantic information (i.e. attributes, word embeddings and descriptions), 2) region descriptions that can enhance scene discrimination. To generate synthetic visual features we propose a two-step generative approach, where local descriptions are sampled and used as conditions to generate visual features. The generated features are then aggregated and used together with real features to train a joint classifier. In order to evaluate the proposed method, we introduce a new dataset for zero-shot scene recognition with multi-semantic annotations. Experimental results on the proposed dataset and SUN Attribute dataset illustrate the effectiveness of the proposed method.


  • Xinhang Song, Haitao Zeng, Sixian Zhang, Luis Herranz, Shuqiang Jiang. 2020. Generalized Zero-shot Learning with Multi-source Semantic Embeddings for Scene Recognition. In 28th ACM International Conference on Multimedia (MM ’20), October 12–16, 2020, Seattle, WA, USA.. ACM, New York, NY, USA.