Learning Scene Attribute for Scene Recognition

Haitao Zeng, Xinhang Song, Gongwei Chen, Shuqiang Jiang
(IEEE Transactions on Multimedia 2019)
(Accepted September 17, 2019)


Scene recognition has been a challenging task in the field of computer vision and multimedia for a long time. The current scene recognition works often extract object features and scene features through CNN, and combine these two types of features to obtain complementary and discriminative scene representations. However, when the scene categories are visually similar, the object features might lack of discriminations. Therefore, it may be debatable to consider only object features. In contrast to the existing works, in this paper, we discuss the discrimination of scene attributes in local regions and utilize scene attributes as the complementary features of object and scene features. We extract these visual features from two individual CNN branches, one extracting the global features of the image while the other extracting the features of local regions. Through contextual modeling framework, we aggregate these features and generate more discriminative scene representations, which achieve better performance than the feature aggregation of object and scene. Moreover, we achieve the new state-of-the-art performance on both standard scene recognition benchmarks by aggregating more complementary visual features: MIT67 (88.06%) and SUN397 (74.12%).

  • Haitao Zeng, Xinhang Song, Gongwei Chen, Shuqiang Jiang. “Learning Scene Attribute for Scene Recognition”, IEEE Transactions on Multimedia 2019 (Accepted)