The human-machine collaboration and interaction are facilitated by comprehensive understanding on multimodal data (e.g., vision, text and speech) and explainable/trustable multimodal reasoning for acquiring/exploiting knowledge from the multimodal information and the interaction process.
Towards trustable human-machine collaboration and interaction, the Media Learning and Knowledge Reasoning (MLKR) group is working on the learning theory, techniques and prototype systems on multimodal data. Major research topics include multimodal understanding, retrieval, recommendation, QA/dialogue and cross-modal content generation. For substantial development of next generation multimodal AI, MLKR group also enforces the theoretic study on new multimodal learning architectures and models inspired by inter-disciplinary thinking between AI and statistics, physics, brain science as well as social science.
During the past 5 years, MLKR group has published 100+ papers in top-tier conferences and journals, including TPAMI, IJCV, CVPR, ICCV, NeurIPS, ICML and ACMMM. Some of the works have been implemented into commercial systems such as multimodal-dialogue based psychological consultation and web content monitoring.