Shuqiang Jiang

Ph.D

Tel:

010-62600505

Email:

sqjiang@ict.ac.cn

Address:

No.6 Kexueyuan South Road Zhongguancun,Haidian District Beijing,China The Institute of Computing Technology of the Chinese Academy of Sciences Key Laboratory of Intelligent Information Processing 100190

ION: Instance-level Object Navigation

Weijie Li, Xinhang Song, Yubing Bai, Sixian Zhang, Shuqiang Jiang,

(ACM Multimedia 2021), October 20–24, 2021, Chengdu, China

[PDF ]

视觉物体导航是Embodied AI中一项基础且重要的研究课题, 指的是智能体根据指令导航到指定物体。现有的工作主要基于类别级的视觉物体导航，即导航到任意一个符合目标类别的物体就算成功。然而实际应用中往往需要更精细化的物体导航，即导航到指定的特定目标物体，例如，当我们的需求是“喝水”的时候，我们期望智能体能够找到“我们自己的杯子”，而不是任意他人的杯子。因此，本文提出了一个基于实例的视觉物体导航任务(Instance-level Object Navigation, ION)，并设计了相应的导航模型框架以及评判标准。基于现有模拟器AI2-THOR, 我们设计了一套物体实例化和自动标注系统，这套系统能够模拟现实生活中物体种类数量繁多的场景，并自动生成描述物体实例的标注数据<物体类别, 物体颜色, 物体材质, 空间关系>，本工作自动收集了27,735条物体实例数据，以此构成ION数据集。此外，针对提出的实例级视觉物体导航任务，我们提出了一个级联框架，其中，基于实例的物体关系图模型（Instance-Relation Graph, IRG）的节点表示物体实例的颜色、材质信息，边表示物体实例的空间关系。在导航过程中，通过实例筛选（Instance Selection）,被检测到的物体实例可以激活IRG中相应的节点, 结合目标实例掩模（Instance Mask）和实例框定（Instance Grounding），智能体最终找到目标物体实例。我们通过实验验证了实例级视觉物体导航任务的挑战性，并证明了本文提出的级联框架比基准方法在实例级评估指标上具有更好的性能。

Abstract

Visual object navigation is a fundamental task in Embodied AI. Previous works focus on the category-wise navigation, in which navigating to any possible instance of target object category is considered a success. Those methods may be effective to find the general objects. However, it may be more practical to navigate to the specific instance in our real life, since our particular requirements are usually satisfied with specific instances rather than all instances of one category. How to navigate to the specific instance has been rarely researched before and is typically challenging to current works. In this paper, we introduce a new task of Instance Object Navigation (ION), where instance-level descriptions of targets are provided and instance-level navigation is required. In particular, multiple types of attributes such as colors, materials and object ref- erences are involved in the instance-level descriptions of the targets. In order to allow the agent to maintain the ability of instance nav- igation, we propose a cascade framework with Instance-Relation Graph (IRG) based navigator and instance grounding module. To specify the different instances of the same object categories, we construct instance-level graph instead of category-level one, where instances are regarded as nodes, encoded with the representation of colors, materials and locations (bounding boxes). During nav- igation, the detected instances can activate corresponding nodes in IRG, which are updated with graph convolutional neural net- work (GCNN). The final instance prediction is obtained with the grounding module by selecting the candidates (instances) with max- imum probability (a joint probability of category, color and material, obtained by corresponding regressors with softmax). For the task evaluation, we build a benchmark for instance-level object navi- gation on AI2-Thor simulator, where over 27,735 object instance descriptions and navigation groundtruth are automatically obtained through the interaction with the simulator. The proposed model outperforms the baseline in instance-level metrics, showing that our proposed graph model can guide instance object navigation, as well as leaving promising room for further improvement. The project is available at https://github.com/LWJ312/ION.

Weijie Li, Xinhang Song, Yubing Bai, Sixian Zhang, Shuqiang Jiang. “ION: Instance-level Object Navigation”, 29th ACM International Conference on Multimedia (ACM Multimedia 2021), Chengdu, China, October 20-24, 2021.

Download:

2021-Li-ACMMM.pdf