Kai Chen's Homepage

Email / CV / Github / Twitter / Google Scholar

About Me

I am currently a Ph.D. candidate in CSE department of Hong Kong University of Science and Technology (HKUST), supervised by Prof. Dit-Yan Yeung. Previously, I was an undergraduate student majoring in Computer Science in Fudan University honored as the Outstanding Undergraduate of Shanghai (上海市优秀毕业生), supervised by Prof. Yanwei Fu. My research interests include Machine Learning and Artificial Intelligence, aiming at building generalizable AI systems from a data-centric perspective. Currently, I'm trying to answer, 1) Does more data always result in better performance? 2) How to generate corner cases with generative models? 3) How to fix corner cases with minimum human intervention?

👋 I'm currently on job market of both academics and industry. Feel free to send me emails if we are a good fit!

Some recent works include:

AIGC Helpfulness - Omni-modal Large Language Model: EMOVA.
AIGC Harmfulness - Data Flywheel for (M)LLM Alignment: Mistake analysis, MoTE, ECSO.
AIGC Helpfulness - Mixture of Cluster-conditional Experts (MoCE): MoCLE, MoCE, SDR.
AIGC Helpfulness - Controllable Perception Corner Case Generation: GeoDiffusion, MagicDrive, TrackDiffusion, Geom-Erasing, DetDiffusion, MagicDrive3D (generative models), CODA, CODA-LM (corner case dataset).
AIGC Helpfulness - Object-level Self-supervised Learning: MixedAE, MultiSiam.

News

[2025.06] [New!] We release RACRO, an efficient and scalable method of build multi-modal reasoning models which can flexibly adapt to any advanced reasoning LLMs during infernece time. Welcome to try our demo!
[2025.05] [New!] I give a talk on EMOVA at AI TIME! Check recording (Chinese) here.
[2025.03] [New!] EMOVA, a frontier end-to-end omni-modal model with SoTA vision-language and speech abilities, has been accepted by CVPR 2025 and fully open-sourced!
[2025.06] One paper accepted by ICCV 2025! See you in Hawaii!
[2025.06] One paper accepted by TMLR 2025!
[2025.05] One paper accepted by ACL 2025! See you in Vienna!
[2025.03] Invited to serve as a reviewer for NeurIPS 2025, ICCV 2025, ACM MM 2025!
[2025.02] One paper accepted by CVPR 2025! See you in Nashville!
[2024.12] Invited to serve as an area chair for IJCAI 2025!
[2024.12] Invited to serve as a reviewer for ICML 2025!
[2024.10] We announace EMOVA, a novel end-to-end omni-modal model (i.e., w/o ASR or TTS) with SoTA vision-language and speech abilities, further supporting emotional dialogue
[2024.10] Two papers accepted by WACV 2025! See you in Tucson!
[2024.09] We will hold the Workshop on Multimodal Perception and Comprehension of Corner Cases in Autonomous Driving: Towards Next-Generation Solutions at ECCV 2024! Looking forward to see you in Milano, Italy!
[2024.08] Invited to serve as a reviewer for ICLR 2025!
[2024.07] Two papers accepted by ECCV 2024! See you in Milano, Italy!
[2024.05] I give a talk about Geometric-controllable Visual Generation: A Systemetic Solution (GeoDiffusion, TrackDiffusion, MagicDrive, W-CODA2024) at VALSE Webinar!
[2024.04] I give a talk about corner case generation for autonomous driving (GeoDiffusion, TrackDiffusion, MagicDrive, DetDiffusion) at AIDriver!
[2024.04] CODA-LM, the new multi-modal version of CODA, is online!
[2024.03] Invited to serve as a reviewer for NeuIPS 2024, TCSVT!
[2024.02] One paper accepted by CVPR 2024! See you in Seattle!
[2024.02] I give a talk about our ICLR 2024 work Mistake Analysis at AI TIME!
[2024.02] Code and checkpoints of GeoDiffusion and MagicDrive have been released. Welcome to try!
[2024.01] I give a talk about our ICLR 2024 work Mistake Analysis at TechBeat!
[2024.01] Three papers accepted by ICLR 2024! See you in Vienna!
[2024.01] Invited to serve as a reviewer for TPAMI, ECCV 2024, ACCV 2024!
[2023.12] Our MoCLE is reported by Liangziwei!
[2023.12] Our MoCLE, the first MLLM with MoE architecture for instruction customization and generalization, is on Arxiv!
[2023.12] The first stage of our controllable perception data generation series of works, GeoDiffusion (2D detection), MagicDrive (3D detection), TrackDiffusion (video tracking) and Geom-Erasing (concept removal), is on Arxiv!
[2023.12] Recent surveys [1][2] show the remarkble GPT-4V still suffers from corner cases from our CODA dataset!
[2023.12] Invited to serve as a reviewer for IJCAI 2024, CVPR 2024, ICLR 2024!
[2023.10] Our MagicDrive is reported by Xinzhiyuan, and Mistake Analysis is reported by Liangziwei!
[2023.05] Our papers MixedAE (CVPR 2023), MoCE (ICLR 2023) and CODA (ECCV 2022) will be presented in VALSE 2023! See you in Wuxi!
[2023.05] One paper accepted by Workshop of Self-supervised Learning, VALSE 2023 (spotlight)!
[2023.05] One paper accepted by Workshop of Autonomous Driving, VALSE 2023 (spotlight)!
[2023.03] Invited to serve as a reviewer for NeurIPS 2023!
[2023.02] One paper accepted by CVPR 2023! See you in Vancouver!
[2023.01] One paper accepted by ICLR 2023 (spotlight Top25%)! Happy Lunar New Year!
[2023.01] Invited to serve as a reviewer for ICCV 2023, IJCAI 2023!
[2022.11] Invited to serve as a reviewer for CVPR 2023!
[2022.08] Our CODA dataset will be utilized to hold the 2nd SSLAD ECCV 2022 workshop and competition at CodaLab!
[2022.08] Invited to serve as a reviewer for ICLR 2023!
[2022.07] One paper accepted by ECCV 2022!
[2022.06] Invited to serve as a reviewer for TIP!
[2022.05] Invited to serve as a reviewer for NeurIPS 2022, ECCV 2022!
[2021.12] One paper accepted by AAAI 2022!
[2021.11] Invited to serve as a reviewer for CVPR 2022, ICRA 2022 and AAAI 2022!
[2021.10] One paper accepted by NeurIPS 2021!
[2021.07] One paper accepted by ICCV 2021!
[2021.07] Our SODA10M dataset will be utilized to hold the SSLAD ICCV 2021 workshop on Self-supervised Learning for Next-Generation Industry-level Autonomous Driving. All are welcome!
[2021.06] Invited to serve as a reviewer for NeurIPS 2021!
[2020.06] Successful undergrad thesis defend!
[2020.03] One paper accepted by IEEE Access!
[2019.06] One paper accepted by IROS 2019!

Selected Publications

Full publication list on Google Scholar. (* denotes equal contribution)

AIGC Harmfulness - Data Flywheel for (M)LLM Alignment

Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation

Yunhao Gou*, Kai Chen*, Zhili Liu*, Lanqing Hong, Hang Xu, Zhenguo Li, Dit-Yan Yeung, James Kwok, Yu Zhang.

Free data engine for MLLM alignment on its own LLM!

European Conference on Computer Vision (ECCV), 2024.

[PDF] [Project page]

Mixture of insighTful Experts (MoTE): The Synergy of Thought Chains and Expert Mixtures in Self-Alignment

Zhili Liu*, Yunhao Gou*, Kai Chen*, Lanqing Hong, Jiahui Gao, Fei Mi, Yu Zhang, Zhenguo Li, Xin Jiang, Qun Liu, James T. Kwok

Self-alignment with MoE-empowered CoT multi-dimensional analysis!

Arxiv preprint, 2024.

[PDF]

Gaining Wisdom from Setbacks: Aligning Large Language Models via Mistake Analysis

Kai Chen*, Chunwei Wang*, Kuo Yang, Jianhua Han, Lanqing Hong, Fei Mi, Hang Xu, Zhengying Liu, Wenyong Huang, Zhenguo Li, Dit-Yan Yeung, Lifeng Shang, Xin Jiang, Qun Liu.

Enhancing LLM's generation ability via its own discrimination ability!

International Conference on Learning Representations (ICLR), 2024.

[PDF] [Wechat Post] [Talk]

AIGC Helpfulness - Mixture of Cluster-conditional Experts

Mixture of Cluster-conditional LoRA Experts for Vision-language Instruction Tuning

Yunhao Gou*, Zhili Liu*, Kai Chen*, Lanqing Hong, Hang Xu, Aoxue Li, Dit-Yan Yeung, James Kwok, Yu Zhang.

First MLLM with MoE for instruction customization and generalization!

Arxiv preprint, 2023.

[PDF] [Project page] [Wechat Post] [Talk]

Task-customized Masked Autoencoder via Mixture of Cluster-conditional Experts

Zhili Liu*, Kai Chen*, Jianhua Han, Lanqing HONG, Hang Xu, Zhenguo Li, James Kwok.

International Conference on Learning Representations (ICLR), 2023 (spotlight Top25%).

[PDF][Wechat Post]

Task-Customized Self-Supervised Pre-training with Scalable Dynamic Routing.

Zhili Liu, Jianhua Han, Kai Chen, Lanqing Hong, Hang Xu, Chunjing Xu, Zhenguo Li.

AAAI Conference on Artificial Intelligence (AAAI), 2022.

[PDF]

AIGC Helpfulness - Multi-modal Corner Case Datasets for Autonomous Driving

Automated Evaluation of Large Vision-Language Models on Self-driving Corner Cases

Kai Chen*, Yanze Li*, Wenhua Zhang*, Yanxin Liu, Pengxiang Li, Ruiyuan Gao, Lanqing Hong, Meng Tian, Xinhai Zhao, Zhenguo Li, Dit-Yan Yeung, Huchuan Lu, and Xu Jia.

First multi-modal corner case dataset for autonomous driving!

IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2025.

[PDF] [Project page] [ECCV 2024 Workshop]

CODA: A Real-World Corner Case Dataset for Object Detection in Autonomous Driving

Kaican Li*, Kai Chen*, Haoyu Wang*, Lanqing Hong, Chaoqiang Ye, Jianhua Han, Yukuai Chen, Wei Zhang, Chunjing Xu, Dit-Yan Yeung, Xiaodan Liang, Zhenguo Li, Hang Xu.

First large-scale real-life road corner case dataset!

European Conference on Computer Vision (ECCV), 2022.

Workshop of Automonous Driving, Vision and Learning Seminar (VALSE), 2023 (spotlight).

[PDF] [Website] [Talk] [ECCV 2022 Workshop] [GPT-4V still suffers from CODA]

AIGC Helpfulness - Controllable Perception Corner Case Generation

MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes

Ruiyuan Gao, Kai Chen, Zhihao Li, Lanqing Hong, Zhenguo Li, Qiang Xu.

Any-view street scene generation!

Arxiv preprint, 2024.

[PDF] [Project page]

Implicit Concept Removal of Diffusion Models

Zhili Liu*, Kai Chen*, Yifan Zhang, Jianhua Han, Lanqing Hong, Hang Xu, Zhenguo Li, Dit-Yan Yeung, James Kwok.

Geometric-controllable concept eraser for diffusion models!

European Conference on Computer Vision (ECCV), 2024.

[PDF] [Project page] [Talk]

DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception

Yibo Wang*, Ruiyuan Gao*, Kai Chen*, Kaiqiang Zhou, Yingjie Cai, Lanqing Hong, Zhenguo Li, Lihui Jiang, Dit-Yan Yeung, Qiang Xu, Kai Zhang.

First personalized corner case generation work for object detection!

IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR), 2024.

[PDF] [Wechat Post]

TrackDiffusion: Tracklet-Conditioned Video Generation via Diffusion Models

Pengxiang Li*, Kai Chen*, Zhili Liu*, Ruiyuan Gao, Lanqing Hong, Guo Zhou, Hua Yao, Dit-Yan Yeung, Huchuan Lu, Xu Jia.

First tracklet-conditioned world model for multi-object tracking!

IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2025.

[PDF][Project page]

MagicDrive: Street View Generation with Diverse 3D Geometry Control

Ruiyuan Gao*, Kai Chen*, Enze Xie, Lanqing Hong, Zhenguo Li, Dit-Yan Yeung, Qiang Xu.

First multi-view video generation work for 3D detection!

International Conference on Learning Representations (ICLR), 2024.

[PDF][Project page][Wechat Post][Talk] [ECCV 2024 Workshop]

GeoDiffusion: Text-Prompted Geometric Control for Object Detection Data Generation

Kai Chen*, Enze Xie*, Zhe Chen, Yibo Wang, Lanqing Hong, Zhenguo Li, Dit-Yan Yeung.

First geometric-controllable work for 2D detection!

International Conference on Learning Representations (ICLR), 2024.

[PDF][Project page]

AIGC Helpfulness - Object-level Self-supervised Learning

Mixed Autoencoder for Self-supervised Visual Representation Learning

Kai Chen*, Zhili Liu*, Lanqing Hong, Hang Xu, Zhenguo Li, Dit-Yan Yeung.

IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR), 2023.

Workshop of Self-supervised Learning, Vision and Learning Seminar (VALSE), 2023 (spotlight).

[PDF][Wechat Post][Talk]

MultiSiam: Self-supervised Multi-instance Siamese Representation Learning for Autonomous Driving

Kai Chen, Lanqing Hong, Hang Xu, Zhenguo Li, Dit-Yan Yeung.

IEEE/CVF International Conference on Computer Vision (ICCV), 2021.

[PDF] [Zhihu]

SODA10M: A Large-Scale 2D Self/Semi-Supervised Object Detection Dataset for Autonomous Driving.

Jianhua Han, Xiwen Liang, Hang Xu, Kai Chen, Lanqing Hong, Jiageng Mao, Chaoqiang Ye, Wei Zhang, Zhenguo Li, Xiaodan Liang, Chunjing Xu.

Datasets and Benchmarks Track, Neural Information Processing Systems (NeurIPS), 2021.

[PDF] [Website] [Talk] [ICCV 2021 Workshop]

Talks

[AI TIME Online] EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions. [Recording]
[VALSE Webinar] Geometric-controllable Visual Generation: A Systemetic Solution. [Recording]
[AIDriver Online] Controllable Corner Case Generation for Autonomous Driving. [Recording]
[AI TIME Online] Gaining Wisdom from Setbacks: Aligning Large Language Models via Mistake Analysis. [Recording]
[TechBeat Online] Gaining Wisdom from Setbacks: Aligning Large Language Models via Mistake Analysis. [Recording]
[VALSE 2023@Wuxi] Mixed Autoencoder for Self-supervised Visual Representation Learning. [Recording]
[VALSE 2023@Wuxi] CODA: A Real-World Road Corner Case Dataset for Object Detection in Autonomous Driving. [Recording]

Experiences

Indiana University Bloomington

Indiana, U.S.A.

June 2019 - Sep. 2019

Visiting Scholar at Computer Vision lab, supervised by Prof. David Crandall

University of Manchester

Manchester, U.K.

Sep. 2018 - Jan. 2019

International exchange student, supervised by Dr. Tingting Mu

Academic Services

Program committee/Organizer:

The Workshop on Multimodal Perception and Comprehension of Corner Cases in Autonomous Driving at ECCV 2024.
The 2nd SSLAD Workshop at ECCV 2022.
The 1st SSLAD (Self-supervised Learning for Next-generation Industry-level Autonomous Driving) Workshop at ICCV 2021.

Area chair:

Conference: IJCAI 2025.

Reviewer:

Conference: NeurIPS 2025/2024/2023/2022/2021, MM 2025, ICCV 2025/2023, ICML 2025, CVPR 2025/2024/2023/2022, ICLR 2025/2024/2023, ECCV 2024/2022, ACCV 2024, IJCAI 2024/2023, ICRA 2022, AAAI 2022.
Journal: TPAMI, TCSVT, TIP and IEEE Access.

Selected Awards

HKUST Research Travel Grant

2023

HKUST Postgraduate Scholarship

2020

Outstanding Graduate of Shanghai [post]

2020

Scholarship for Outstanding Graduates of Fudan University

2020

Joel & Ruth Spira Scholarship

2019

Oversea Visiting Student Stipend of Fudan University

2019

National Scholarship

2018

Scholarship for Outstanding Undergraduates of Fudan University

2017

Interest

I love basketball and I'm also a big fan of Stepfen Curry, MVP point guard of Golden State Warriors, NBA. I'm a team member of my class's basketball team and often play Score / Power forward (SF/PF). In my spare time, I also play the role of a basketball game referee. Hope one day I can have a chance to see a home game of Warriors in Chase Center San Francisco!

Kai Chen (陈铠)

Ph.D. Candidate @ HKUST

AIGC Harmfulness - Data Flywheel for (M)LLM Alignment

AIGC Helpfulness - Mixture of Cluster-conditional Experts

AIGC Helpfulness - Multi-modal Corner Case Datasets for Autonomous Driving

AIGC Helpfulness - Controllable Perception Corner Case Generation

AIGC Helpfulness - Object-level Self-supervised Learning