Expressional Region Retrieval

Xiaoqian Guo, Xiangyang Li, Shuqiang Jiang
(ACMMM 2020)

图像检索是多媒体领域的一个重要研究课题,应用广泛。图像中的区域包含着非常丰富的信息,但以往的检索方法只局限于图像中的单个物体或只关注整体图像的视觉场景。本文提出了一个新的图像检索任务,Expressional Region Retrieval。该任务着眼于图像区域,且考虑图像区域的语言描述。本文探索了基于图像中可表达区域的图像检索,同时利用视觉和语言信息来提升检索性能。

Abstract

Image retrieval is a long-standing topic in the multimedia community due to its various applications, e.g., product search and artworks retrieval in museum. The regions in images contain a wealth of information. Users may be interested in the objects presented in the image regions or the relationships between them. But previous retrieval methods are either limited to the single object of images, or tend to the entire visual scene. In this paper, we introduce a new task called expressional region retrieval, in which the query is formulated as a region of image with the associated description. The goal is to find images containing the similar content with the query and localize the regions within them. As far as we know, this task has not been explored yet. We propose a framework to address this issue. The region proposals are first generated based on region detectors and language features are extracted. Then the Gated Residual Network (GRN) takes language information as a gate to control the transformation of visual features. In this way, the combined visual and language representation is more specific and discriminative for expressional region retrieval. We evaluate our method on a new established benchmark which is constructed based on the Visual Genome dataset. Experimental results demonstrate that our model effectively utilizes both visual and language information, outperforming the baseline methods.


  • Xiaoqian Guo, Xiangyang Li, Shuqiang Jiang. 2020. Expressional Region Retrieval. In 28th ACM International Conference on Multimedia (MM ’20), October 12–16, 2020, Seattle, WA, USA.. ACM, New York, NY, USA. https://doi.org/10.1145/3394171.3413567