Our group focuses on comprehensive scene understanding to enable intelligent perception and understanding of natural visual environment in the open world. More specifically, we aim to propose a vision-based robot system that has the basic capability just like human visual processing system for real world visual scene understanding, mainly including perceptual tasks such as object detection, object recognition, semantic segmentation, scene classification, attribute learning, relationship extraction, and so on. To facilitate more advanced natural language based visual concept semantic description, the system can also incorporate language models and knowledge-based reasoning for cognitive tasks like image/video captioning (description) and visual question answering.
Research topics of our group mainly cover three aspects: 1) Object recognition, e.g. zero-shot learning, incremental/life-long learning, image retrieval, image classification, etc. 2) Scene understanding, e.g. object detection/segmentation, scene classification, relationship detection, scene graph generation, etc., and 3) Language/knowledge-based cognition, e.g. image/video captioning (description), visual question answering, visual concept learning, knowledge graph, etc.