Conference Paper----Visual Information Processing and Learning (VIPL)

Zhiguang Lu, Qianqian Xu, Shilong Bao, Zhiyong Yang, Qingming Huang. Bidirectional Logits Tree: Pursuing Granularity Reconcilement in Fine-Grained Classification. AAAI Conference on Artificial Intelligence (AAAI), pp. 19189–19197, Philadelphia, PA, USA, Feb. 25-Mar. 4, 2025. PDF

Guanqi Ding, Chengyu Yang, Shuhui Wang, Xincheng Li, Jinzhe Zhang, Xin Jin, Qingming Huang. Dis²Booth: Learning Image Distribution with Disentangled Features for Text-to-Image Diffusion Models. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), pp. 2744–2752, Philadelphia, PA, USA, Feb. 25–Mar. 4, 2025. PDF

Shuo Cai, Xinzhe Han, Shuhui Wang. Divide-and-Conquer: Tree-structured Strategy with Answer Distribution Estimator for Goal-Oriented Visual Dialogue. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), pp. 1917-1925, Philadelphia, PA, USA, Feb. 25–Mar. 4, 2025. PDF

Yuchen Sun, Qianqian Xu, Zitai Wang, Zhiyong Yang, Junwei He. EDGE: Unknown-aware Multi-label Learning by Energy Distribution Gap Expansion. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), pp. 12613–12621, Philadelphia, PA, USA, Feb. 25–Mar. 4, 2025. PDF

Yunbin Tu, Liang Li, Li Su, Qingming Huang. Query-centric Audio-Visual Cognition Network for Moment Retrieval, Segmentation and Step-Captioning. 39th Annual AAAI Conference on Artificial Intelligence (AAAI), pp. 7464-7472, Philadelphia, PA, USA, Feb. 25–Mar. 4, 2025. PDF

Xingyu Lyu, Qianqian Xu, Zhiyong Yang, Shaojie Lyu, Qingming Huang. SSE-SAM: Balancing Head and Tail Classes Gradually through Stage-Wise SAM. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), pp. 19278–19286, Philadelphia, PA, USA, Feb. 25–Mar. 4, 2025. PDF

Gaoxiang Cong, Liang Li, Jiadong Pan, Zhedong Zhang, Amin Beheshti, Anton Van Den Hengel, Yuankai Qi, Qingming Huang. FlowDubber: Movie Dubbing with LLM-based Semantic-aware Learning and Flow Matching based Voice Enhancing. ACM International Conference on Multimedia (ACM MM), Dublin, Ireland, Oct. 27-31, 2025. PDF

Jiadong Pan, Liang Li, Hongcheng Gao, Zhengjun Zha, Qingming Huang, Jiebo Luo. SafeCFG: Controlling Harmful Features with Dynamic Safe Guidance for Safe Generation. ACM International Conference on Multimedia (ACM MM), Dublin, Ireland, Oct. 27-31, 2025. PDF

Qiyang Wan, Ruiping Wang, Chengzhi Gao, Xilin Chen. Catch Your Concepts: A Flexible ConceptLocator for Interpretable Visual Recognition. 36th British Machine Vision Conference (BMVC), Sheffield, UK, Nov. 24-27, 2025. PDF

Tianyue Wang, Shuang Yang, Shiguang Shan, Xilin Chen. GLip: A Global-Local Integrated Progressive Framework for Robust Visual Speech Recognition. 36th British Machine Vision Conference (BMVC), Sheffield, UK, Nov. 24-27, 2025. PDF

Yujie Zhao, Jiabei Zeng, Shiguang Shan. Pose-Robust Calibration Strategy for Point-of-Gaze Estimation on Mobile Phones. 36th British Machine Vision Conference (BMVC), Sheffield, UK, Nov. 24-27, 2025. PDF

Gaoxiang Cong, Jiadong Pan, Liang Li, Yuankai Qi, Yuxin Peng, Anton van den Hengel, Jian Yang, Qingming Huang. EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 15863-15873, Nashville, TN, USA, Jun. 11–15, 2025. PDF

Zonghui Guo, Yingjie Liu, Jie Zhang, Haiyong Zheng, Shiguang Shan. Face Forgery Video Detection via Temporal Forgery Cue Unraveling. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7396-7405, Nashville, TN, USA, Jun. 10–17, 2025. PDF

Ziyi Bai, Hanxuan Li, Bin Fu, Chuyan Xiong, Ruiping Wang, Xilin Chen. R2C: Mapping Room to Chessboard to Unlock LLM As Low-Level Action Planner. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 19456–19466, Nashville, TN, USA, Jun. 10–17, 2025. PDF

Zhen Yang, Zhuo Tao, Qi Chen, Yuankai Qi, Liang Li, Anton van den Hengel, Qingming Huang. Separation of powers: On segregating knowledge from observation in LLM-enabled knowledge-based visual question answering. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 24753-24762, Nashville, TN, USA, Jun. 10–17, 2025. PDF

Yiheng Li, Ruibing Hou, Hong Chang, Shiguang Shan, Xilin Chen. UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 27805-27815, Nashville, TN, USA, Jun. 10–17, 2025. PDF

Yue Wu, Zhaobo Qi, Junshu Sun, Yaowei Wang, Qingming Huang, Shuhui Wang. Video Language Model Pretraining with Spatio-temporal Masking. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8557-8567, Nashville, TN, USA, Jun. 10–17, 2025. PDF

Yujie Wang, Yunwei Zhao, Jing Yang, Han Han, Shiguang Shan, Jie Zhang. Evaluating Cognitive-Behavioral Fixation via Multimodal User Viewing Patterns on Social Media. The 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP), Suzhou, China, Nov. 4-9, 2025. PDF

Dan Han, Mingjie He, Jie Zhang, Shiguang Shan. Dual-Branch Partial Annotation Learning for Facial Attributes Recognition. IEEE 19th International Conference on Automatic Face and Gesture Recognition (FG), Tampa/Clearwater, FL, USA, May 26-30, 2025. PDF

Xinkuan Qiu, Meina Kan, Yongbin Zhou, Shiguang Shan. Benchmarking Multimodal Large Language Models Against Image Corruptions. IEEE/CVF International Conference on Computer Vision (ICCV), Honolulu, HI, USA, Oct. 19-23, 2025. PDF