Media Learning and Knowledge Reasoning
Leader: Shuhui Wang (Professor) / Qingming Huang (Professor)
Email: wangshuhui [at] ict dot ac dot cn; qmhuang [at] ucas dot ac dot cn
Introduction of research group

The human-machine collaboration and interaction are facilitated by comprehensive understanding on multimodal data (e.g., vision, text and speech) and explainable/trustable multimodal reasoning for acquiring/exploiting knowledge from the multimodal information and the interaction process.

Towards trustable human-machine collaboration and interaction, the Media Learning and Knowledge Reasoning (MLKR) group is working on the learning theory, techniques and prototype systems on multimodal data. Major research topics include multimodal understanding, retrieval, recommendation, QA/dialogue and cross-modal content generation. For substantial development of next generation multimodal AI, MLKR group also enforces the theoretic study on new multimodal learning architectures and models inspired by inter-disciplinary thinking between AI and statistics, physics, brain science as well as social science.

During the past 5 years, MLKR group has published 100+ papers in top-tier conferences and journals, including TPAMI, IJCV, CVPR, ICCV, NeurIPS, ICML and ACMMM. Some of the works have been implemented into commercial systems such as multimodal-dialogue based psychological consultation and web content monitoring.

Papers

Journal Papers

  • Cong Zhang, Shuhui Wang, Xiaodan Li, Yao Zhu, Honggang Qi, Qingming Huang. Enhancing the Robustness of Vision-Language Foundation Models by Alignment Perturbation. IEEE Transactions on Information Forensics and Security (TIFS), Vol. 20, pp. 7091–7105, 2025.
  • Ting Yu, Binhui Ge, Shuhui Wang, Yan Yang, Qingming Huang, Jun Yu. Consistency Conditioned Memory Augmented Dynamic Diagnosis Model for Medical Visual Question Answering. IEEE Journal of Biomedical and Health Informatics (JBHI), Vol. 29, No. 2, pp. 1357–1370, 2025.
  • Jiaxin An, Liang Cao, Yingxun Wang, Ahmer Khan Jadoon, Shuhui Wang. Adaptive Fault-Tolerant Optimized Platoon Cloud Tracking Control for Heterogeneous Vehicles via Dual Learning Mechanism. IEEE Transactions on Automation Science and Engineering (TASE), Vol. 22, pp. 4382–4393, 2025.
  • Jiehua Zhang, Liang Li, Chenggang Yan, Wei Ke, and Yihong Gong. Monocular Depth Estimation on Adverse Weathers with Curriculum Domain Distribution Alignment. IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), Vol. 35, No. 1, pp. 178-194, Jan. 2025.
  • Ting Yu, Kunhao Fu, Shuhui Wang, Qingming Huang, Jun Yu. Prompting Video-Language Foundation Models With Domain-Specific Fine-Grained Heuristics for Video Question Answering. IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), Vol. 35, No. 2, pp. 1615–1630, February 2025.
  • Chao Bi, Shuhui Wang, Na Li, Qingming Huang. Inferential and Commonsense Visual Question Generation. IEEE Transactions on Multimedia (TMM), Vol. 27, pp. 7796–7809, 2025.
  • Liang Li, Tongyu Lu, Yaoqi Sun, Yuhan Gao, Chenggang Yan, Zhenghui Hu and Qingming Huang. Progressive Decision Boundary Shifting for Unsupervised Domain Adaptation. IEEE Transactions on Neural Networks and Learning Systems (TNNLS), Vol. 36, No. 1, pp. 274-285, Jan. 2025.
  • Liang Li, Gaoxiang Cong, Yuankai Qi, Zheng-Jun Zha, Qi Wu, Michael Sheng, Qingming Huang, Ming-Hsuan Yang. Dubbing Movies via Hierarchical Phoneme Modeling and Acoustic Diffusion Denoising. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Vol. 47, No. 11, pp. 10361-10377, 2025.
  • Zitai Wang, Qianqian Xu, Zhiyong Yang, Peisong Wen, Yuan He, Xiaochun Cao, Qingming Huang. Top-K Pairwise Ranking: Bridging the Gap Among Ranking-Based Measures for Multi-Label Classification. International Journal of Computer Vision (IJCV), Vol. 133, No. 1, pp. 211-253, January 2025.
  • Jilong Zhu, Junbao Zhuo, Shuhui Wang. PIC: Domain generalization by path information constraint. Pattern Recognition (PR), 168: 111769, 2025.
  • Peisong Wen, Qianqian Xu, Zhiyong Yang, Yuan He, Qingming Huang. Algorithm-Dependent Generalization of AUPRC Optimization: Theory and Algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Vol. 46, No. 7, pp. 5062-5079, 2024.
  • Shilong Bao, Qianqian Xu, Zhiyong Yang, Yuan He, Xiaochun Cao, and Qingming Huang. Improved Diversity-Promoting Collaborative Metric Learning for Recommendation. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Vol. 46, No. 12, pp. 9004-9022, 2024.
  • Beichen Zhang, Liang Li, Shuhui Wang, Shaofei Cai, Zheng-Jun Zha, Qi Tian, Qingming Huang. Inductive State-Relabeling Adversarial Active Learning with Heuristic Clique Rescaling. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Vol. 46, No. 12, pp. 9780-9796, 2024.
  • Guorong Li, Hanhua Ye, Yuankai Qi, Shuhui Wang, Laiyun Qing, Qingming Huang, Ming-Hsuan Yang. Learning Hierarchical Modular Networks for Video Captioning. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Vol. 46, No. 2, pp. 1049-1064, 2024.
  • Tianwei Cao, Qianqian Xu, Zhiyong Yang and Qingming Huang. Mitigating Confounding Bias in Practical Recommender Systems with Partially Inaccessible Exposure Status. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Vol. 46, No. 2, pp. 957-974, 2024.
  • Yunbin Tu, Liang Li, Li Su, Zheng-Jun Zha, Qingming Huang. SMART: Syntax-calibrated Multi-Aspect Relation Transformer for Change Captioning. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Vol. 46, No. 7, pp. 4926-4943, 2024.
  • Zhaobo Qi, Shuhui Wang, Weigang Zhang, Qingming Huang. Uncertainty-Boosted Robust Video Activity Anticipation. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Vol. 46, No. 12, pp. 7775-7792, 2024.
  • Ke Ma, Qianqian Xu, Jinshan Zeng, Wei Liu, Xiaochun Cao, Yingfei Sun, and Qingming Huang. Sequential Manipulation against Rank Aggregation: Theory and Algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Vol. 46, No. 12, pp. 9353-9370, 2024.
  • Beichen Zhang, Liang Li, Zheng-jun Zha, Jiebo Luo, Qingming Huang. Downstream-Pretext Domain Knowledge Traceback for Active Learning. IEEE Transactions on Multimedia (TMM), Vol. 26, pp. 10585-10596, 2024.
  • Zhipeng Yu, Qianqian Xu, Yangbangyan Jiang, Yingfei Sun, and Qingming Huang. Enhancing Sample Utilization in Noise-robust Deep Metric Learning with Subgroup-based Positive-pair Selection. IEEE Transactions on Image Processing (TIP), Vol. 33, pp. 6083-6097, 2024.

Conference Papers

  • Zhiguang Lu, Qianqian Xu, Shilong Bao, Zhiyong Yang, Qingming Huang. Bidirectional Logits Tree: Pursuing Granularity Reconcilement in Fine-Grained Classification. AAAI Conference on Artificial Intelligence (AAAI), pp. 19189–19197, Philadelphia, PA, USA, Feb. 25-Mar. 4, 2025.
  • Guanqi Ding, Chengyu Yang, Shuhui Wang, Xincheng Li, Jinzhe Zhang, Xin Jin, Qingming Huang. Dis²Booth: Learning Image Distribution with Disentangled Features for Text-to-Image Diffusion Models. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), pp. 2744–2752, Philadelphia, PA, USA, Feb. 25–Mar. 4, 2025.
  • Shuo Cai, Xinzhe Han, Shuhui Wang. Divide-and-Conquer: Tree-structured Strategy with Answer Distribution Estimator for Goal-Oriented Visual Dialogue. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), pp. 1917-1925, Philadelphia, PA, USA, Feb. 25–Mar. 4, 2025.
  • Yuchen Sun, Qianqian Xu, Zitai Wang, Zhiyong Yang, Junwei He. EDGE: Unknown-aware Multi-label Learning by Energy Distribution Gap Expansion. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), pp. 12613–12621, Philadelphia, PA, USA, Feb. 25–Mar. 4, 2025.
  • Yunbin Tu, Liang Li, Li Su, Qingming Huang. Query-centric Audio-Visual Cognition Network for Moment Retrieval, Segmentation and Step-Captioning. 39th Annual AAAI Conference on Artificial Intelligence (AAAI), pp. 7464-7472, Philadelphia, PA, USA, Feb. 25–Mar. 4, 2025.
  • Xingyu Lyu, Qianqian Xu, Zhiyong Yang, Shaojie Lyu, Qingming Huang. SSE-SAM: Balancing Head and Tail Classes Gradually through Stage-Wise SAM. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), pp. 19278–19286, Philadelphia, PA, USA, Feb. 25–Mar. 4, 2025.
  • Gaoxiang Cong, Liang Li, Jiadong Pan, Zhedong Zhang, Amin Beheshti, Anton Van Den Hengel, Yuankai Qi, Qingming Huang. FlowDubber: Movie Dubbing with LLM-based Semantic-aware Learning and Flow Matching based Voice Enhancing. ACM International Conference on Multimedia (ACM MM), Dublin, Ireland, Oct. 27-31, 2025.
  • Jiadong Pan, Liang Li, Hongcheng Gao, Zhengjun Zha, Qingming Huang, Jiebo Luo. SafeCFG: Controlling Harmful Features with Dynamic Safe Guidance for Safe Generation. ACM International Conference on Multimedia (ACM MM), Dublin, Ireland, Oct. 27-31, 2025.
  • Gaoxiang Cong, Jiadong Pan, Liang Li, Yuankai Qi, Yuxin Peng, Anton van den Hengel, Jian Yang, Qingming Huang. EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 15863-15873, Nashville, TN, USA, Jun. 11–15, 2025.
  • Zhen Yang, Zhuo Tao, Qi Chen, Yuankai Qi, Liang Li, Anton van den Hengel, Qingming Huang. Separation of powers: On segregating knowledge from observation in LLM-enabled knowledge-based visual question answering. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 24753-24762, Nashville, TN, USA, Jun. 10–17, 2025.
  • Yue Wu, Zhaobo Qi, Junshu Sun, Yaowei Wang, Qingming Huang, Shuhui Wang. Video Language Model Pretraining with Spatio-temporal Masking. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8557-8567, Nashville, TN, USA, Jun. 10–17, 2025.
  • Fanglue Zhang, Shufan Shen, Chao Bi, Li Su, Qingming Huang, Shuhui Wang. SVDLoRA: Data-Driven Low-Rank Adaptation via Spectral Decomposition. IEEE International Conference on Data Mining Workshops (ICDMW), Washington, DC, USA, Dec. 12-15, 2025.
  • Shufan Shen, Zhaobo Qi, Junshu Sun, Qingming Huang, Qi Tian, Shuhui Wang. Enhancing Pre-trained Representation Classifiability can Boost its Interpretability. The Thirteenth International Conference on Learning Representations (ICLR), pp. 38903-38927, Singapore, Singapore, Apr. 24-28, 2025.
  • Yue Wu, Zhaobo Qi, Yiling Wu, Junshu Sun, Yaowei Wang, Shuhui Wang. Learning fine-grained representations through textual token disentanglement in composed video retrieval. The Thirteenth International Conference on Learning Representations (ICLR), pp. 91981-92003, Singapore, Singapore, Apr. 24-28, 2025.
  • Cong Hua, Qianqian Xu, Zhiyong Yang, Zitai Wang, Shilong Bao, Qingming Huang. OpenworldAUC: Towards Unified Evaluation and Optimization for Open-world Prompt Tuning. International Conference on Machine Learning (ICML), Vancouver, BC, Canada, Jul. 13-19, 2025.
  • Jinzhe Liu, Junshu Sun, Shufan Shen, Chenxue Yang, Shuhui Wang. Edit Less, Achieve More: Dynamic Sparse Neuron Masking for Lifelong Knowledge Editing in LLMs. Annual Conference on Neural Information Processing Systems (NeurIPS), San Diego, CA, USA, Dec. 2-7, 2025.
  • Junxi Chen, Liang Li, Yunbin Tu, Li Su, Zhe Xue, Qingming Huang. Generalizing Single-Frame Supervision to Event-Level Understanding for Video Anomaly Detection. Annual Conference on Neural Information Processing Systems (NeurIPS), San Diego, CA, USA, Dec. 2-7, 2025.
  • Boyu Han, Qianqian Xu, Shilong Bao, Zhiyong Yang, Kangli Zi, Qingming Huang. LightFair: Towards an Efficient Alternative for Fair T2I Diffusion via Debiasing Pre-trained Text Encoders. Annual Conference on Neural Information Processing Systems (NeurIPS), San Diego, CA, USA, Dec. 2-7, 2025.
  • Junshu Sun, Wanxing Chang, Chenxue Yang, Qingming Huang, Shuhui Wang. Relieving the Over-aggregating Effect in Graph Transformers. Annual Conference on Neural Information Processing Systems (NeurIPS), San Diego, CA, USA, Dec. 2-7, 2025.
  • Shufan Shen, Junshu Sun, Qingming Huang, Shuhui Wang. VL-SAE: Interpreting and Enhancing Vision-Language Alignment with a Unified Concept Set. Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS), San Diego, CA, USA, Dec. 2-7, 2025.