蒋树强个人主页
蒋树强
博士,研究员,博士生导师
电话:
010-62600505
邮箱:
sqjiang@ict.ac.cn
地址:
北京市海淀区科学院南路6号 中国科学院计算技术研究所 智能信息处理重点实验室 100190

MUCH: MUtual Coupling enHancement of scene recognition and dense captioning.

Xinhang Song, Bohan Wang, Gongwei Chen and Shuqiang Jiang,
(ACM Multimedia 2019), 21-25 October 2019, Nice, France
[PDF ]

Abstract

Due to the abstraction of scenes, comprehensive scene understanding requires semantic modeling in both global and local aspects. Scene recognition is usually researched from a global point of view, while dense captioning is typically studied for local regions. Previous works separately research on the modeling of scene recognition and dense captioning. In contrast, we propose a joint learning framework that benefits from the mutual coupling of scene recognition and dense captioning models. Generally, these two tasks are coupled through two steps, 1) fusing the supervision by considering the contexts between scene labels and local captions, and 2) jointly optimizing semantically symmetric LSTM models. Particularly, in order to balance bias between dense captioning and scene recognition, a scene adaptive non-maximum suppression (NMS) method is proposed to emphasize the scene related regions in region proposal procedure, and a region-wise and category-wise weighted pooling method is proposed to avoid over attention on particular regions in local to global pooling procedure. For the model training and evaluation, scene labels are manually annotated for Visual Genome database. The experimental results on Visual Genome show the effectiveness of the proposed method. Moreover, the proposed method also can improve previous CNN based works on public scene databases, such as MIT67 and SUN397.

  • Xinhang Song, Bohan Wang, Gongwei Chen and Shuqiang Jiang. MUCH: MUtual Coupling enHancement of scene recognition and dense captioning. (ACM Multimedia 2019), 21-25 October 2019, Nice, France.



Download: