Amorphous Region Context Modeling for Scene Recognition

Haitao Zeng, Xinhang Song, Gongwei Chen, Shuqiang Jiang
(IEEE Transactions on Multimedia 2020)
[PDF]

场景图像通常由前景和背景区域内容组成。一些现有的方法提出使用密集网格来提取区域内容。这样的网格可以将物体分成几个离散的部分,使得区域块中的语义含义并不明确。同时,物体性的方法可能只关注场景图像中的前景内容,导致背景内容和空间结构不完整。与现有方法相比,本文提出了一种解决语义模糊的方法,即检测区域内容本身的边界,并通过语义分割技术精确定位区域内容的无定形轮廓。此外,在构建场景表示时,我们引入了图像中完整的前景和背景信息。通过图神经网络对这些区域建模,探索了区域之间的上下文关系,得到用于场景识别的具有区分性的场景特征表示。在MIT67和SUN397上的实验结果证明了提出方法的有效性和泛化性。

Abstract

Scene images are usually composed of foreground and background regional contents. Some existing methods propose to extract regional contents with dense grids or objectness region proposals. However, dense grids may split the object into several discrete parts, learning semantic ambiguity for the patches. The objectness methods may focus on particular objects but only pay attention to the foreground contents and do not exploit the background that is key to scene recognition. In contrast, we propose a novel scene recognition framework with amorphous region detection and context modeling. In the proposed framework, discriminative regions are first detected with amorphous contours that can tightly surround the targets through semantic segmentation techniques. In addition, both foreground and background regions are jointly embedded to obtain the scene representations with the graph model. Based on the graph modeling module, we explore the contextual relations between the regions in geometric and morphology aspects, and generate the discriminative representations for scene recognition. Experimental results on MIT67 and SUN397 demonstrate the effectiveness and generality of the proposed method.


  • Haitao Zeng, Xinhang Song, Gongwei Chen, Shuqiang Jiang. “Amorphous Region Context Modeling for Scene Recognition”, IEEE Transactions on Multimedia (TMM), 2020.(Accepted December 7, 2020)