Visual Scene Understanding
Leader: Ruiping Wang (Professor)
Email: ruiping.wang [at] vipl.ict.ac.cn
Introduction of research group

Our group focuses on comprehensive scene understanding to enable intelligent perception and understanding of natural visual environment in the open world. More specifically, we aim to propose a vision-based robot system that has the basic capability just like human visual processing system for real world visual scene understanding, mainly including perceptual tasks such as object detection, object recognition, semantic segmentation, scene classification, attribute learning, relationship extraction, and so on. To facilitate more advanced natural language based visual concept semantic description, the system can also incorporate language models and knowledge-based reasoning for cognitive tasks like image/video captioning (description) and visual question answering.

Research

Research topics of our group mainly cover three aspects: 1) Object recognition, e.g. zero-shot learning, incremental/life-long learning, image retrieval, image classification, etc. 2) Scene understanding, e.g. object detection/segmentation, scene classification, relationship detection, scene graph generation, etc., and 3) Language/knowledge-based cognition, e.g. image/video captioning (description), visual question answering, visual concept learning, knowledge graph, etc.

Papers

Journal Papers

  • Ziyi Bai, Ruiping Wang, Difei Gao, Xilin Chen. Event Graph Guided Compositional Spatial–Temporal Reasoning for Video Question Answering. IEEE Transactions on Image Processing (TIP), Vol. 33, pp. 1109-1121, 2024.
  • Chen He, Ruiping Wang, Shiguang Shan, Xilin Chen. Introspective GAN: Learning to Grow a GAN for Incremental Generation and Classification. Pattern Recognition (PR), 151: 110383, 2024.
  • Xuhan Zhu, Ruiping Wang, Xiangyuan Lan, Yaowei Wang. Local Context Attention Learning for Fine-grained Scene Graph Generation. Pattern Recognition (PR), 156: 110708, 2024.
  • Shishi Qiao, Ruiping Wang, Shiguang Shan, Xilin Chen. Hierarchical image-to-image translation with nested distributions modeling. Pattern Recognition (PR), 146: 110058, 2024.
  • Xiaodong Wu, Ruiping Wang, Xilin Chen. Data-efficient 3D Instance Segmentation by Transferring Knowledge from Synthetic Scans. Pattern Recognition Letters (PRL), Vol. 179, pp. 151-157, 2024.
  • Wenbin Wang, Ruiping Wang, Shiguang Shan, Xilin Chen. Importance First: Generating Scene Graph of Human Interest. International Journal of Computer Vision, vol. 131, no. 10, pp. 2489-2515, Oct. 2023.
  • Shishi Qiao, Ruiping Wang, Shiguang Shan, Xilin Chen. Hierarchical Image-to-image Translation with Nested Distributions Modeling. Pattern Recognition, vol. 146, pp. 1-12, Feb. 2024.
  • Shishi Qiao, Ruiping Wang, Shiguang Shan, Xilin Chen. Hierarchical Disentangling Network for Object Representation Learning. Pattern Recognition, vol. 140, pp. 1-15, Aug. 2023.
  • Difei Gao, Ruiping Wang, Shiguang Shan and Xilin Chen. CRIC: A VQA Dataset for Compositional Reasoning on Vision and Commonsense. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 45, no. 5, pp. 5561-5578, 2023.
  • Chen He, Ruiping Wang, Xilin Chen. Rethinking Class Orders and Transferability in Class Incremental Learning. Pattern Recognition Letter, 161: 67-73, 2022.
  • 王文彬,王瑞平,陈熙霖. 附加偏见预测器辅助的均衡化场景图生成. 中国科学:信息科学, 52(11): 2075-2092, 2022.
  • Shishi Qiao, Ruiping Wang, Shiguang Shan and Xilin Chen. Deep Video Code for Efficient Face Video Retrieval. Pattern Recognition, 113:107754, 2021.
  • Zhiwu Huang, Ruiping Wang, Xianqiu Li, Wenxian Liu, Shiguang Shan, Luc Van Gool, Xilin Chen, \"Geometry-aware Similarity Learning on SPD Manifolds for Visual Recognition,\" IEEE Transactions on circuits and systems for video technology, 28(10), Page(s):2513 – 2523. 2018.10.
  • Wen Wang, Ruiping Wang, Zhiwu Huang, Shiguang Shan, Xilin Chen, “Discriminant Analysis on Riemannian Manifold of Gaussian Distributions for Face Recognition with Image Sets,” IEEE Transactions on Image Processing (TIP), vol. 27, no. 1, pp. 151-163, Jan. 2018.
  • Haomiao Liu, Ruiping Wang, Shiguang Shan and Xilin Chen, “Deep Supervised Hashing for Fast Image Retrieval,” International Journal of Computer Vision, vol. 127, no. 9, pp. 1217–1234, Sep. 2019.
  • Difei Gao, Ruiping Wang, Shiguang Shan, and Xilin Chen, "Learning to Recognize Visual Concepts for Visual Question Answering with Structural Label Space," IEEE Journal of Selected Topics in Signal Processing, 14(3):494-505, 2020.
  • Haomiao Liu, Ruiping Wang, Shiguang Shan, Xilin Chen, “Learning Multifunctional Binary Codes for Personalized Image Retrieval,” International Journal of Computer Vision, vol. 128, no. 8, pp. 2223–2242, Sep. 2020.
  • Shishi Qiao, Ruiping Wang, Shiguang Shan, Xilin Chen, "Deep Heterogeneous Hashing for Face Video Retrieval," IEEE Transactions on Image Processing, vol. 29, no. 1, pp. 1299-1312, Dec. 2020.
  • Haomiao Liu, Ruiping Wang, Shiguang Shan amd Xilin Chen. What is Tabby? Interpretable Model Decisions by Learning Attribute-based Classification Criteria. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 43(5):1791–1807, 2021.
  • Huajie Jiang, Ruiping Wang, Shiguang Shan, Yan Li, Haomiao Liu, Xilin Chen, “Attribute Annotation on Large Scale Image Database by Active Knowledge Transfer,” Image and Vision Computing, vol. 78, pp. 1-13, Oct. 2018.

Conference Papers

  • Hanxuan Li, Bin Fu, Ruiping Wang, Xilin Chen. Point2Real: Bridging the Gap between Point Cloud and Realistic Image for Open-World 3D Recognition. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), pp. 3055-3063, Vancouver, Canada, Feb. 20-27, 2024.
  • Xuhan Zhu, Yifei Xing, Ruiping Wang, Yaowei Wang, Xiangyuan Lan. Calibration for Long-tailed Scene Graph Generation. ACM Conference on Multimedia (ACM MM), pp. 3037-3046, Melbourne, Australia, Oct. 28-Nov.1, 2024.
  • Xuhan Zhu, Yifei Xing, Ruiping Wang, Yaowei Wang, Xiangyuan Lan. Hierarchical Prompt Learning for Scene Graph Generation. British Machine Vision Conference (BMVC), Glasgow, UK, Nov. 25-28, 2024.
  • Bin Fu, Qiyang Wan, Jialin Li, Ruiping Wang, Xilin Chen. Blocks as Probes: Dissecting Categorization Ability of Large Multimodal Models. British Machine Vision Conference (BMVC), Glasgow, UK, Nov. 25-28, 2024.
  • Chuyan Xiong, Chengyu Shen, Xiaoqi Li, Kaichen Zhou, Jiaming Liu, Ruiping Wang, Hao Dong. Autonomous Interactive Correction MLLM for Robust Robotic Manipulation. Annual Conference on Robot Learning (CoRL), Munich, Germany, Nov. 6-9, 2024.
  • Yaxuan Qin, Jiayu Xu, Ruiping Wang, Xilin Chen. Think before Placement: Common Sense Enhanced Transformer for Object Placement. European Conference on Computer Vision (ECCV), pp. 35-50, Mico Milano, Italy, Sep 29-Oct 4, 2024.
  • Ziwei Yao, Ruiping Wang, Xilin Chen. HiFi-Score: Fine-grained Image Description Evaluation with Hierarchical Parsing Graphs. European Conference on Computer Vision (ECCV), pp. 441-458, Mico Milano, Italy, Sep 29-Oct 4, 2024.
  • Qiyang Wan, Ruiping Wang, Xilin Chen. Interpretable Object Recognition by Semantic Prototype Analysis. IEEE/CVF Winter Conference of Applications on Computer Vision (WACV), pp. 800-809, Waikoloa, HI, Jan. 4-8, 2024.
  • Ziyi Bai, Ruiping Wang, Xilin Chen. Glance and Focus: Memory Prompting for Multi-Event Video Question Answering. Annual Conference on Neural Information Processing Systems (NeurIPS), New Orleans, LA, Dec. 10-16, 2023.
  • Fengyuan Yang, Ruiping Wang, Xilin Chen. Semantic Guided Latent Parts Embedding for Few-Shot Learning. IEEE Winter Conference of Applications on Computer Vision (WACV 2023), pp. 5436-5446, Waikoloa, HI, Jan. 3-7, 2023.
  • Hui Nie, Ruiping Wang, Xilin Chen. From Node to Graph: Joint Reasoning on Visual-Semantic Relational Graph for Zero-Shot Detection. Proceedings of the IEEE Winter Conference of Applications on Computer Vision (WACV), pp. 1648-1657, Waikoloa, Hawaii, Jan. 4-8, 2022.
  • Fengyuan Yang, Ruiping Wang, Xilin Chen. SEGA: Semantic Guided Attention on Visual Prototype for Few-Shot Learning. Proceedings of the IEEE Winter Conference of Applications on Computer Vision (WACV), pp. 1586–1596, Waikoloa, Hawaii, Jan. 4-8, 2022.
  • Chen He, Ruiping Wang and Xilin Chen. A Tale of Two CILs: The Connections Between Class Incremental Learning and Class Imbalanced Learning and Beyond. IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshop on Continual Learning in Computer Vision (CLVision), pp. 3559–3569, Virtual Event, Jun. 19-25, 2021.
  • Difei Gao, Ruiping Wang, Ziyi Bai and Xilin Chen. Env-QA: A Video Question Answering Benchmark for Comprehensive Understanding of Dynamic Environments. IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1675-1685, Montreal, Canada, Oct. 11-17, 2021.
  • Jiwei Xiao, Ruiping Wang and Xilin Chen. Holistic Pose Graph: Modeling Geometric Structure among Objects in a Scene using Graph Inference for 3D Object Prediction. IEEE/CVF International Conference on Computer Vision (ICCV), pp. 12717–12726, Montreal, Canada, Oct. 11-17, 2021.
  • Wenbin Wang, Ruiping Wang and Xilin Chen. Topic Scene Graph Generation by Attention Distillation from Caption. IEEE/CVF International Conference on Computer Vision (ICCV), pp. 15900-15910, Montreal, Canada, Oct. 11-17, 2021.
  • Sijin Wang, Ziwei Yao, Ruiping Wang, Zhongqin Wu and Xilin Chen. FAIEr: Fidelity and Adequacy Ensured Image Caption Evaluation. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14050–14059, Virtual Event, June 19-25, 2021.
  • Difei Gao, Ke li, Ruiping Wang, Shiguang Shan, Xilin Chen, \"Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text,\" IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2020), pp. 12746–12756, 2020.
  • Wenbin Wang, Ruiping Wang, Shiguang Shan, Xilin Chen, "Sketching Image Gist: Human-Mimetic Hierarchical Scene Graph Generation," Proceedings of the 16th European Conference on Computer Vision (ECCV), LNCS 12358, Vol.13, pp.222-239, Glasgow, UK / Cyberspace, August 23-28, 2020.
  • Ruikui Wang, Shishi Qiao, Ruiping Wang, Shiguang Shan, Xilin Chen, "Hybrid Video and Image Hashing for Robust Face Retrieval," IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020) , pp. 168-175, 2020.