I am currently a Ph.D. candidate in CSE department of Hong Kong University of Science and Technology (HKUST), supervised by Prof. Dit-Yan Yeung. Previously, I was an undergraduate student majoring in Computer Science in Fudan University honored as the Outstanding Undergraduate of Shanghai (上海市优秀毕业生), supervised by Prof. Yanwei Fu. My research focuses on Multi-modal Learning and Generative Models, aiming at building reliable Multi-modal AI systems from a data-centric perspective. Currently, I'm trying to answer, 1) How to construct end-to-end Multi-modal LLMs with frontier visual, textual, and speech capabilities? 2) How to construct 3D visual world models in a controllable and scalable manner? 3) How to enhance Multi-modal LLMs via training with synthetic data and world models?
👋 I'm currently on job market of both academics and industry. Feel free to send me emails if we are a good fit!
Some recent works include:
Full publication list on Google Scholar. (* denotes equal contribution, highlighted blocks denote the represnetative works)
Works are organized with respect to topics, including:
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR), 2025
Task-customized Masked Autoencoder via Mixture of Cluster-conditional Experts
International Conference on Learning Representations (ICLR), 2023 (spotlight Top25%).
Task-Customized Self-Supervised Pre-training with Scalable Dynamic Routing
AAAI Conference on Artificial Intelligence (AAAI), 2022.
Unified Triplet-Level Hallucination Evaluation for Large Vision-Language Models
Transactions on Machine Learning Research (TMLR), 2025.
Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation
European Conference on Computer Vision (ECCV), 2024.
Annual Meeting of the Association for Computational Linguistics (ACL), 2025.
Gaining Wisdom from Setbacks: Aligning Large Language Models via Mistake Analysis
International Conference on Learning Representations (ICLR), 2024.
European Conference on Computer Vision (ECCV), 2024.
Automated Evaluation of Large Vision-Language Models on Self-driving Corner Cases
IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2025.
CODA: A Real-World Corner Case Dataset for Object Detection in Autonomous Driving
European Conference on Computer Vision (ECCV), 2022.
Workshop of Automonous Driving, Vision and Learning Seminar (VALSE), 2023 (spotlight).
MagicDrive-V2: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control
IEEE/CVF International Conference on Computer Vision (ICCV), 2025.
Implicit Concept Removal of Diffusion Models
European Conference on Computer Vision (ECCV), 2024.
MagicDrive: Street View Generation with Diverse 3D Geometry Control
International Conference on Learning Representations (ICLR), 2024.
TrackDiffusion: Tracklet-Conditioned Video Generation via Diffusion Models
IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2025.
GeoDiffusion: Text-Prompted Geometric Control for Object Detection Data Generation
International Conference on Learning Representations (ICLR), 2024.
Mixed Autoencoder for Self-supervised Visual Representation Learning
IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR), 2023.
Workshop of Self-supervised Learning, Vision and Learning Seminar (VALSE), 2023 (spotlight).
MultiSiam: Self-supervised Multi-instance Siamese Representation Learning for Autonomous Driving
IEEE/CVF International Conference on Computer Vision (ICCV), 2021.
SODA10M: A Large-Scale 2D Self/Semi-Supervised Object Detection Dataset for Autonomous Driving.
Datasets and Benchmarks Track, Neural Information Processing Systems (NeurIPS), 2021.
Program committee/Organizer:
CVPR 2025 Travel Awards
HKUST Research Travel Grant
HKUST Postgraduate Scholarship
Outstanding Graduate of Shanghai [post]
Scholarship for Outstanding Graduates of Fudan University
Oversea Visiting Student Stipend of Fudan University
Joel & Ruth Spira Scholarship
National Scholarship
Scholarship for Outstanding Undergraduates of Fudan University
I love basketball and I'm also a big fan of Stepfen Curry, MVP point guard of Golden State Warriors, NBA. I'm a team member of my class's basketball team and often play Score / Power forward (SF/PF). In my spare time, I also play the role of a basketball game referee. Hope one day I can have a chance to see a home game of Warriors in Chase Center San Francisco!