Shuqiang Jiang's homepage
Shuqiang Jiang
Ph.D
Tel:
010-62600505
Email:
sqjiang@ict.ac.cn
Address:
No.6 Kexueyuan South Road Zhongguancun,Haidian District Beijing,China The Institute of Computing Technology of the Chinese Academy of Sciences Key Laboratory of Intelligent Information Processing 100190

See More for Scene: Pairwise Consistency Learning for Scene Classification

Gongwei Chen, Xinhang Song, Bohan Wang, Shuqiang Jiang,
(NeurIPS 2021), December 6-14, 2021
[PDF ]

场景分类是一个具有价值的计算机视觉任务,其独有的特性依然需要进一步的研究。基本上,场景特性是分布于整张图像上的,这就需要分类模型能“看到”更全面和有信息的区域。之前的工作主要集中在场景图像中区域的发掘和融合上,而很少考虑卷积网络内在特性以及其可以满足场景分类需求的潜在能力。在本文中,我们提出了基于聚焦区域来理解场景图像和场景分类网络。从这个新的研究角度,我们发现当模型学习场景特性后,场景分类模型会倾向发现更大的聚焦区域。对于现有模型训练策略的分析帮助我们理解聚焦区域对于模型性能影响,并且引发我们思考用于场景分类的最优训练方法。为了追求对于场景特性更好的利用,我们提出了一种新的学习方法配合定制的损失函数来实现在场景图像激活更大的聚焦区域。因为缺少需要扩大的目标区域的监督信息,从另一个角度,我们的学习策略是通过消除已经被激活的区域来允许模型在训练中去激活更多区域。提出的策略可以通过保持被消除图像和原始图像的输出的成对一致性来实现。特别的,定制的损失函数利用类别相关信息来保持这种成对一致性。基于Places365数据集的实验展示了我们方法在各种网络结构上带来的显著提升效果。我们的方法在物体数据集ImageNet上得到了较差的结果,这实验性地表明我们方法捕获了场景独有的特性。

Abstract

Scene classification is a valuable classification subtask and has its own characteristics which still needs more in-depth studies. Basically, scene characteristics are distributed over the whole image, which cause the need of “seeing” comprehensive and informative regions. Previous works mainly focus on region discovery and aggregation, while rarely involves the inherent properties of CNN along with its potential ability to satisfy the requirements of scene classification. In this paper, we propose to understand scene images and the scene classification CNN models in terms of the focus area. From this new perspective, we find that large focus area is preferred in scene classification CNN models as a consequence of learning scene characteristics. Meanwhile, the analysis about existing training schemes helps us to understand the effects of focus area, and also raises the question about optimal training method for scene classification. Pursuing the better usage of scene characteristics, we propose a new learning scheme with a tailored loss in the goal of activating larger focus area on scene images. Since the supervision of the target regions to be enlarged is usually lacked, our alternative learning scheme is to erase already activated area, and allow the CNN models to activate more area during training. The proposed scheme is implemented by keeping the pairwise consistency between the output of the erased image and its original one. In particular, a tailored loss is proposed to keep such pairwise consistency by leveraging category-relevance information. Experiments on Places365 show the significant improvements of our method with various CNNs. Our method shows an inferior result on object dataset, ImageNet, which experimentally indicates that it captures the unique characteristics of scenes.

Gongwei Chen, Xinhang Song, Bohan Wang, and Shuqiang Jiang. "See More for Scene: Pairwise Consistency Learning for Scene Classification." 35th Advances in Neural Information Processing Systems (NeurIPS 2021), Dec. 6-14, 2021.



Download: