The group focuses on frontier research in Medical Visual Intelligence, dedicated to developing approaches spanning from "Precise Perception" to "Cognitive Reasoning" and "Decision Collaboration". Addressing the challenges of cross-modal heterogeneity in medical imaging, the dynamic complexity of surgical scenes, and the high reliability required for clinical decisions, we investigate Medical Multimodal Large Models (MLMs), self-supervised representation learning, and controllable visual generation. By integrating multi-source healthcare data, we provide key algorithmic support for computer-aided diagnosis, surgical planning & navigation, and clinical decision-making—empowering clinicians to achieve robust perception, insightful thinking, and optimized intervention.
(1) Precise Multimodal Perception
We investigate robust visual perception methods based on multimodal medical data. This direction aims to overcome the limitations of single-modality perception in complex medical environments (e.g., bleeding, smoke, and occlusions), enabling accurate localization, segmentation, and recognition of medical targets in challenging scenarios.
(2) Insightful Cross-modal Reasoning
We establish associative mappings between medical visual features and semantic knowledge to transcend the limitations of traditional "end-to-end" black-box models. This research focuses on deep reasoning guided by multimodal logical anchors, facilitating a shift from simple pattern recognition to interpretable clinical inference.
(3) Full-process Collaborative Optimization
We model the spatio-temporal dynamic evolution of surgical scenes to address the nonlinear offsets between static pre-operative plans and complex intra-operative environments. This work aims to develop personalized surgical navigation and evaluation systems, ensuring precise and adaptive intra-operative guidance.