Media Learning and Knowledge Reasoning
Leader: Shuhui Wang (Professor) / Qingming Huang (Professor)
Email: wangshuhui [at] ict dot ac dot cn; qmhuang [at] ucas dot ac dot cn
Introduction of research group

The human-machine collaboration and interaction are facilitated by comprehensive understanding on multimodal data (e.g., vision, text and speech) and explainable/trustable multimodal reasoning for acquiring/exploiting knowledge from the multimodal information and the interaction process.

Towards trustable human-machine collaboration and interaction, the Media Learning and Knowledge Reasoning (MLKR) group is working on the learning theory, techniques and prototype systems on multimodal data. Major research topics include multimodal understanding, retrieval, recommendation, QA/dialogue and cross-modal content generation. For substantial development of next generation multimodal AI, MLKR group also enforces the theoretic study on new multimodal learning architectures and models inspired by inter-disciplinary thinking between AI and statistics, physics, brain science as well as social science.

During the past 5 years, MLKR group has published 100+ papers in top-tier conferences and journals, including TPAMI, IJCV, CVPR, ICCV, NeurIPS, ICML and ACMMM. Some of the works have been implemented into commercial systems such as multimodal-dialogue based psychological consultation and web content monitoring.

Papers

Journal Papers

  • Peisong Wen, Qianqian Xu, Zhiyong Yang, Yuan He, Qingming Huang. Algorithm-Dependent Generalization of AUPRC Optimization: Theory and Algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Vol. 46, No. 7, pp. 5062-5079, 2024.
  • Shilong Bao, Qianqian Xu, Zhiyong Yang, Yuan He, Xiaochun Cao, and Qingming Huang. Improved Diversity-Promoting Collaborative Metric Learning for Recommendation. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Vol. 46, No. 12, pp. 9004-9022, 2024.
  • Beichen Zhang, Liang Li, Shuhui Wang, Shaofei Cai, Zheng-Jun Zha, Qi Tian, Qingming Huang. Inductive State-Relabeling Adversarial Active Learning with Heuristic Clique Rescaling. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Vol. 46, No. 12, pp. 9780-9796, 2024.
  • Guorong Li, Hanhua Ye, Yuankai Qi, Shuhui Wang, Laiyun Qing, Qingming Huang, Ming-Hsuan Yang. Learning Hierarchical Modular Networks for Video Captioning. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Vol. 46, No. 2, pp. 1049-1064, 2024.
  • Tianwei Cao, Qianqian Xu, Zhiyong Yang and Qingming Huang. Mitigating Confounding Bias in Practical Recommender Systems with Partially Inaccessible Exposure Status. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Vol. 46, No. 2, pp. 957-974, 2024.
  • Yunbin Tu, Liang Li, Li Su, Zheng-Jun Zha, Qingming Huang. SMART: Syntax-calibrated Multi-Aspect Relation Transformer for Change Captioning. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Vol. 46, No. 7, pp. 4926-4943, 2024.
  • Zhaobo Qi, Shuhui Wang, Weigang Zhang, Qingming Huang. Uncertainty-Boosted Robust Video Activity Anticipation. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Vol. 46, No. 12, pp. 7775-7792, 2024.
  • Ke Ma, Qianqian Xu, Jinshan Zeng, Wei Liu, Xiaochun Cao, Yingfei Sun, and Qingming Huang. Sequential Manipulation against Rank Aggregation: Theory and Algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Vol. 46, No. 12, pp. 9353-9370, 2024.
  • Beichen Zhang, Liang Li, Zheng-jun Zha, Jiebo Luo, Qingming Huang. Downstream-Pretext Domain Knowledge Traceback for Active Learning. IEEE Transactions on Multimedia (TMM), Vol. 26, pp. 10585-10596, 2024.
  • Zhipeng Yu, Qianqian Xu, Yangbangyan Jiang, Yingfei Sun, and Qingming Huang. Enhancing Sample Utilization in Noise-robust Deep Metric Learning with Subgroup-based Positive-pair Selection. IEEE Transactions on Image Processing (TIP), Vol. 33, pp. 6083-6097, 2024.
  • Sheng Fang, Tiantian Dang, Shuhui Wang, Qingming Huang. Linguistic Hallucination for Text-Based Video Retrieval. IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), Vol. 34, No. 10, pp. 9692-9705, 2024.
  • Yunbin Tu, Liang Li, Li Su, Junping Du, Ke Lu, Qingming Huang. Viewpoint-Adaptive Representation Disentanglement Network for Change Captioning. IEEE Transactions on Image Processing, Vol. 32, pp. 2620-2635, 2023.
  • Zhaobo Qi, Shuhui Wang, Chi Su, Li Su, Qingming Huang, Qi Tian. Self-Regulated Learning for Egocentric Video Activity Anticipation. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 45, No. 6, pp. 6715-6730, 2023.
  • Hao Wang, Zheng-Jun Zha, Liang Li, Xuejin Chen, Jiebo Luo. Semantic and Relation Modulation for Audio-Visual Event Localization. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Vol. 45, No. 6, pp. 7711-7725, 2023.
  • Tianwei Cao, Qianqian Xu, Zhiyong Yang and Qingming Huang. Mitigating Confounding Bias in Practical Recommender Systems with Partially Inaccessible Exposure Status. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023.
  • Xuejing Liu, Liang Li, Shuhui Wang, Zheng-Jun Zha, Zechao Li, Qi Tian, Qingming Huang. Entity-Enhanced Adaptive Reconstruction Network for Weakly Supervised Referring Expression Grounding. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI). Vol. 45, No. 3, pp. 3003-3018, 2023.
  • Weigang Zhang, Zhaobo Qi, Shuhui Wang, Chi Su, Li Su, Qingming Huang. Temporal Dynamic Concept Modeling Network for Explainable Video Event Recognition. ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 19, No. 6, pp. 1-22, 2023.
  • Zhiyong Yang, Qianqian Xu, Shilong Bao, Peisong Wen, Yuan He, Xiaochun Cao and Qingming Huang. AUC-Oriented Domain Adaptation: From Theory to Algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Vol. 45, No. 12, pp. 14161–14174, Dec. 2023.
  • Yangbangyan Jiang, Qianqian Xu, Yunrui Zhao, Zhiyong Yang, Peisong Wen, Xiaochun Cao, and Qingming Huang. Positive-Unlabeled Learning with Label Distribution Alignment. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Vol. 45, No. 12, pp. 15345–15363, Dec. 2023.
  • Zhiyong Yang, Qianqian Xu, Wenzheng Hou, Shilong Bao, Yuan He, Xiaochun Cao and Qingming Huang. Revisiting AUC-oriented Adversarial Training with Loss-Agnostic Perturbations. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Vol. 45, No. 12, pp. 15494–15511, Dec. 2023.

Conference Papers

  • Churan Zhi, Junbao Zhuo, Shuhui Wang. Confusing Pair Correction Based on Category Prototype for Domain Adaptation under Noisy Environments. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), pp. 17060-17068, Vancouver, Canada, Feb. 20–27, 2024.
  • Yunbin Tu, Liang Li, Li Su, Zheng-Jun Zha, Chenggang Yan, Qingming Huang. Context-aware Difference Distilling for Multi-change Captioning. Annual Meeting of the Association for Computational Linguistics (ACL), pp. 7941-7956, Bangkok, Thailand, Aug. 11–16, 2024.
  • Yiming Cui, Liang Li, Jiehua Zhang, Chenggang Yan, Hongkui Wang, Shuai Wang, Jin Heng, Wu Li. Stochastic Context Consistency Reasoning for Domain Adaptive Object Detection. ACM Conference on Multimedia (ACM MM), pp. 1331-1340, Melbourne, Australia, Oct. 28-Nov 1, 2024.
  • Henglei Lv, Jiayu Xiao, Liang Li. Pick-and-Draw: Training-free Semantic Guidance for Text-to-Image Personalization. ACM Conference on Multimedia (ACM MM), pp. 10535- 10543, Melbourne, Australia, Oct. 28-Nov 1, 2024.
  • Zhedong Zhang, Liang Li, Gaoxiang Cong, Haibing Yin, Yuhan Gao, Chenggang Yan, Anton van den Hengel, Yuankai Qi. From Speaker to Dubber: Movie Dubbing with Prosody and Duration Consistency Learning. ACM Conference on Multimedia (ACM MM), pp. 7523-7532, Melbourne, Australia, Oct. 28-Nov 1, 2024.
  • Yijia Wang, Qianqian Xu, Yangbangyan Jiang, Siran Dai, Qingming Huang. Regularized Contrastive Partial Multi-view Outlier Detection. ACM Conference on Multimedia (ACM MM), pp. 8711-8720, Melbourne, Australia, Oct. 28-Nov 1, 2024.
  • Kenan Huang, Junbao Zhuo, Shuhui Wang, Chi Su, Qingming Huang, Huimin Ma. Unsupervised Image-to-Video Adaptation via Category-aware Flow Memory Bank and Realistic Video Generation. ACM Conference on Multimedia (ACM MM), pp. 8795-8804, Melbourne, Australia, Oct. 28-Nov 1, 2024.
  • Yang Liu, Qianqian Xu, Peisong Wen, Siran Dai, Qingming Huang. Not All Pairs are Equal: Hierarchical Learning for Average-Precision-Oriented Video Retrieval. ACM Conference on Multimedia (ACM MM), pp. 3828-3837, Melbourne, Australia, Oct. 28-Nov 1, 2024.
  • Junwei He, Qianqian Xu, Yangbangyan Jiang, Zitai Wang, Yuchen Sun, Qingming Huang. HGOE: Hybrid External and Internal Graph Outlier Exposure for Graph Out-of-Distribution Detection. ACM Conference on Multimedia (ACM MM), pp. 1544-1553, Melbourne, Australia, Oct. 28-Nov 1, 2024.
  • Junxi Chen, Liang Li, Li Su, Zheng-Jun Zha, Qingming Huang. Prompt-enhanced Multiple Instance Learning for Weakly Supervised Anomaly Detection. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 18319-18329, Seattle WA, USA, Jun. 17-21, 2024.
  • Yunbin Tu, Liang Li, Li Su, Chenggang Yan, Qingming Huang. Distractors-Immune Representation Learning with Cross-modal Contrastive Regularization for Change Captioning. European Conference on Computer Vision (ECCV), pp. 311-328, Mico Milano, Italy, Sep 29-Oct 4, 2024.
  • Jiayu Xiao, Henglei Lv, Liang Li, Shuhui Wang, Qingming Huang. R&B: Region and Boundary Aware Zero-shot Grounded Text-to-image Generation. International Conference on Learning Representations (ICLR), Vienna, Austria, May 7-11, 2024.
  • Cong Hua, Qianqian Xu, Shilong Bao, Zhiyong Yang, Qingming Huang. ReconBoost: Boosting Can Achieve Modality Reconcilement. International Conference on Machine Learning (ICML), pp. 19573-19597, Vienna, Austria, Jul 21-27, 2024.
  • Zhengqi Pei, Anran Zhang, Shuhui Wang, Qingming Huang. Modeling Language Tokens as Functionals of Semantic Fields. International Conference on Machine Learning (ICML), pp. 40114-40128, Vienna, Austria, Jul 21-27, 2024.
  • Hongyu Liu, Runmin Cong, Hua Li, Qianqian Xu, Qingming Huang, Wei Zhang. ESNet: Evolution and Succession Network for High-Resolution Salient Object Detection. International Conference on Machine Learning (ICML), pp. 30892-30907, Vienna, Austria, Jul 21-27, 2024.
  • Feiran Li, Qianqian Xu, Shilong Bao, Zhiyong Yang, Runmin Cong, Xiaochun Cao, Qingming Huang. Size-invariance Matters: Rethinking Metrics and Losses for Imbalanced Multi-object Salient Object Detection. International Conference on Machine Learning (ICML), pp. 28989-29021, Vienna, Austria, Jul 21-27, 2024.
  • Zhiyong Yang, Qianqian Xu, Zitai Wang, Sicong Li, Boyu Han, Shilong Bao, Xiaochun Cao, Qingming Huang. Harnessing Hierarchical Label Distribution Variations in Test Agnostic Long-tail Recognition. International Conference on Machine Learning (ICML), pp. 56624-56664, Vienna, Austria, Jul 21-27, 2024.
  • Zhengqi Pei, Anran Zhang, Shuhui Wang, Xiangyang Ji, Qingming Huang. Data-free Neural Representation Compression with Riemannian Neural Dynamics. International Conference on Machine Learning (ICML), pp. 40129-40144, Vienna, Austria, Jul 21-27, 2024.
  • Zhedong Zhang, Liang Li, Jiehua Zhang, Zhenghui Hu, Hongkui Wang, Chenggang Yan, Jiang Yang, and Yuankai Qi. Generating High-Quality Symbolic Music Using Fine-Grained Discriminators. International Conference on Pattern Recognition (ICPR), pp. 332-344, Kolkata, India, Dec.1-5, 2024.
  • Benyuan Meng, Qianqian Xu, Zitai Wang, Zhiyong Yang, Xiaochun Cao, Qingming Huang. Suppress Content Shift: Better Diffusion Features via Off-the-Shelf Generation Techniques. Annual Conference on Neural Information Processing Systems (NeurIPS), Vancouver, Canada, Dec. 10-15, 2024.