Kai Chen avatar

Kai Chen (陈铠)

Ph.D. Candidate @ HKUST

Email  /  CV  /  Github  /  Twitter  /  Google Scholar
About Me

I am currently a Ph.D. candidate in CSE department of Hong Kong University of Science and Technology (HKUST), supervised by Prof. Dit-Yan Yeung. Previously, I was an undergraduate student majoring in Computer Science in Fudan University honored as the Outstanding Undergraduate of Shanghai (上海市优秀毕业生), supervised by Prof. Yanwei Fu. My research interests include Machine Learning and Artificial Intelligence, aiming at building generalizable AI systems from a data-centric perspective. Currently, I'm trying to answer, 1) Does more data always result in better performance? 2) How to generate corner cases with generative models? 3) How to fix corner cases with minimum human intervention?

👋 I'm on job market of both academics and industry for Fall 2025. Feel free to send me emails if we are a good fit!

Some recent works include:

News
  • [2024.10] [New!] We announace EMOVA, a novel end-to-end omni-modal model (i.e., w/o ASR or TTS) with SoTA vision-language and speech abilities, further supporting emotional dialogue
  • [2024.10] Two papers accepted by WACV 2025! See you in Tucson!
  • [2024.09] We will hold the Workshop on Multimodal Perception and Comprehension of Corner Cases in Autonomous Driving: Towards Next-Generation Solutions at ECCV 2024! Looking forward to see you in Milano, Italy!
  • [2024.08] Invited to serve as a reviewer for ICLR 2025!
  • [2024.07] Two papers accepted by ECCV 2024! See you in Milano, Italy!
  • [2024.05] I give a talk about Geometric-controllable Visual Generation: A Systemetic Solution (GeoDiffusion, TrackDiffusion, MagicDrive, W-CODA2024) at VALSE Webinar!
  • [2024.04] I give a talk about corner case generation for autonomous driving (GeoDiffusion, TrackDiffusion, MagicDrive, DetDiffusion) at AIDriver!
  • [2024.04] CODA-LM, the new multi-modal version of CODA, is online!
  • [2024.03] Invited to serve as a reviewer for NeuIPS 2024, TCSVT!
  • [2024.02] One paper accepted by CVPR 2024! See you in Seattle!
  • [2024.02] I give a talk about our ICLR 2024 work Mistake Analysis at AI TIME!
  • [2024.02] Code and checkpoints of GeoDiffusion and MagicDrive have been released. Welcome to try!
  • [2024.01] I give a talk about our ICLR 2024 work Mistake Analysis at TechBeat!
  • [2024.01] Three papers accepted by ICLR 2024! See you in Vienna!
  • [2024.01] Invited to serve as a reviewer for TPAMI, ECCV 2024, ACCV 2024!
  • [2023.12] Our MoCLE is reported by Liangziwei!
  • [2023.12] Our MoCLE, the first MLLM with MoE architecture for instruction customization and generalization, is on Arxiv!
  • [2023.12] The first stage of our controllable perception data generation series of works, GeoDiffusion (2D detection), MagicDrive (3D detection), TrackDiffusion (video tracking) and Geom-Erasing (concept removal), is on Arxiv!
  • [2023.12] Recent surveys [1][2] show the remarkble GPT-4V still suffers from corner cases from our CODA dataset!
  • [2023.12] Invited to serve as a reviewer for IJCAI 2024, CVPR 2024, ICLR 2024!
  • [2023.10] Our MagicDrive is reported by Xinzhiyuan, and Mistake Analysis is reported by Liangziwei!
  • [2023.05] Our papers MixedAE (CVPR 2023), MoCE (ICLR 2023) and CODA (ECCV 2022) will be presented in VALSE 2023! See you in Wuxi!
  • [2023.05] One paper accepted by Workshop of Self-supervised Learning, VALSE 2023 (spotlight)!
  • [2023.05] One paper accepted by Workshop of Autonomous Driving, VALSE 2023 (spotlight)!
  • [2023.03] Invited to serve as a reviewer for NeurIPS 2023!
  • [2023.02] One paper accepted by CVPR 2023! See you in Vancouver!
  • [2023.01] One paper accepted by ICLR 2023 (spotlight Top25%)! Happy Lunar New Year!
  • [2023.01] Invited to serve as a reviewer for ICCV 2023, IJCAI 2023!
  • [2022.11] Invited to serve as a reviewer for CVPR 2023!
  • [2022.08] Our CODA dataset will be utilized to hold the 2nd SSLAD ECCV 2022 workshop and competition at CodaLab!
  • [2022.08] Invited to serve as a reviewer for ICLR 2023!
  • [2022.07] One paper accepted by ECCV 2022!
  • [2022.06] Invited to serve as a reviewer for TIP!
  • [2022.05] Invited to serve as a reviewer for NeurIPS 2022, ECCV 2022!
  • [2021.12] One paper accepted by AAAI 2022!
  • [2021.11] Invited to serve as a reviewer for CVPR 2022, ICRA 2022 and AAAI 2022!
  • [2021.10] One paper accepted by NeurIPS 2021!
  • [2021.07] One paper accepted by ICCV 2021!
  • [2021.07] Our SODA10M dataset will be utilized to hold the SSLAD ICCV 2021 workshop on Self-supervised Learning for Next-Generation Industry-level Autonomous Driving. All are welcome!
  • [2021.06] Invited to serve as a reviewer for NeurIPS 2021!
  • [2020.06] Successful undergrad thesis defend!
  • [2020.03] One paper accepted by IEEE Access!
  • [2019.06] One paper accepted by IROS 2019!
Selected Publications

Full publication list on Google Scholar. (* denotes equal contribution)

AIGC Harmfulness - Data Flywheel for (M)LLM Alignment
ecso.png

Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation

Yunhao Gou*, Kai Chen*, Zhili Liu*, Lanqing Hong, Hang Xu, Zhenguo Li, Dit-Yan Yeung, James Kwok, Yu Zhang.

Free data engine for MLLM alignment on its own LLM!

European Conference on Computer Vision (ECCV), 2024.

[PDF] [Project page]
mote.png

Mixture of insighTful Experts (MoTE): The Synergy of Thought Chains and Expert Mixtures in Self-Alignment

Zhili Liu*, Yunhao Gou*, Kai Chen*, Lanqing Hong, Jiahui Gao, Fei Mi, Yu Zhang, Zhenguo Li, Xin Jiang, Qun Liu, James T. Kwok

Self-alignment with MoE-empowered CoT multi-dimensional analysis!

Arxiv preprint, 2024.

[PDF]
mistake_analysis.png

Gaining Wisdom from Setbacks: Aligning Large Language Models via Mistake Analysis

Kai Chen*, Chunwei Wang*, Kuo Yang, Jianhua Han, Lanqing Hong, Fei Mi, Hang Xu, Zhengying Liu, Wenyong Huang, Zhenguo Li, Dit-Yan Yeung, Lifeng Shang, Xin Jiang, Qun Liu.

Enhancing LLM's generation ability via its own discrimination ability!

International Conference on Learning Representations (ICLR), 2024.

[PDF] [Wechat Post] [Talk]
AIGC Helpfulness - Mixture of Cluster-conditional Experts
mocle.png

Mixture of Cluster-conditional LoRA Experts for Vision-language Instruction Tuning

Yunhao Gou*, Zhili Liu*, Kai Chen*, Lanqing Hong, Hang Xu, Aoxue Li, Dit-Yan Yeung, James Kwok, Yu Zhang.

First MLLM with MoE for instruction customization and generalization!

Arxiv preprint, 2023.

[PDF] [Project page] [Wechat Post] [Talk]
moce.png

Task-customized Masked Autoencoder via Mixture of Cluster-conditional Experts

Zhili Liu*, Kai Chen*, Jianhua Han, Lanqing HONG, Hang Xu, Zhenguo Li, James Kwok.

International Conference on Learning Representations (ICLR), 2023 (spotlight Top25%).

[PDF][Wechat Post]
sdr.png

Task-Customized Self-Supervised Pre-training with Scalable Dynamic Routing.

Zhili Liu, Jianhua Han, Kai Chen, Lanqing Hong, Hang Xu, Chunjing Xu, Zhenguo Li.

AAAI Conference on Artificial Intelligence (AAAI), 2022.

[PDF]
AIGC Helpfulness - Multi-modal Corner Case Datasets for Autonomous Driving
coda-lm.png

Automated Evaluation of Large Vision-Language Models on Self-driving Corner Cases

Kai Chen*, Yanze Li*, Wenhua Zhang*, Yanxin Liu, Pengxiang Li, Ruiyuan Gao, Lanqing Hong, Meng Tian, Xinhai Zhao, Zhenguo Li, Dit-Yan Yeung, Huchuan Lu, and Xu Jia.

First multi-modal corner case dataset for autonomous driving!

IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2025.

[PDF] [Project page] [ECCV 2024 Workshop]
coda.png

CODA: A Real-World Corner Case Dataset for Object Detection in Autonomous Driving

Kaican Li*, Kai Chen*, Haoyu Wang*, Lanqing Hong, Chaoqiang Ye, Jianhua Han, Yukuai Chen, Wei Zhang, Chunjing Xu, Dit-Yan Yeung, Xiaodan Liang, Zhenguo Li, Hang Xu.

First large-scale real-life road corner case dataset!

European Conference on Computer Vision (ECCV), 2022.

Workshop of Automonous Driving, Vision and Learning Seminar (VALSE), 2023 (spotlight).

[PDF] [Website] [Talk] [ECCV 2022 Workshop] [GPT-4V still suffers from CODA]
AIGC Helpfulness - Controllable Perception Corner Case Generation
magicdrive3d.jpg

MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes

Ruiyuan Gao, Kai Chen, Zhihao Li, Lanqing Hong, Zhenguo Li, Qiang Xu.

Any-view street scene generation!

Arxiv preprint, 2024.

[PDF] [Project page]
geom_erasing.png

Implicit Concept Removal of Diffusion Models

Zhili Liu*, Kai Chen*, Yifan Zhang, Jianhua Han, Lanqing Hong, Hang Xu, Zhenguo Li, Dit-Yan Yeung, James Kwok.

Geometric-controllable concept eraser for diffusion models!

European Conference on Computer Vision (ECCV), 2024.

[PDF] [Project page] [Talk]
detdiffusion.jpg

DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception

Yibo Wang*, Ruiyuan Gao*, Kai Chen*, Kaiqiang Zhou, Yingjie Cai, Lanqing Hong, Zhenguo Li, Lihui Jiang, Dit-Yan Yeung, Qiang Xu, Kai Zhang.

First personalized corner case generation work for object detection!

IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR), 2024.

[PDF] [Wechat Post]
trackdiffusion.gif

TrackDiffusion: Tracklet-Conditioned Video Generation via Diffusion Models

Pengxiang Li*, Kai Chen*, Zhili Liu*, Ruiyuan Gao, Lanqing Hong, Guo Zhou, Hua Yao, Dit-Yan Yeung, Huchuan Lu, Xu Jia.

First tracklet-conditioned world model for multi-object tracking!

IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2025.

[PDF][Project page]
magicdrive.png

MagicDrive: Street View Generation with Diverse 3D Geometry Control

Ruiyuan Gao*, Kai Chen*, Enze Xie, Lanqing Hong, Zhenguo Li, Dit-Yan Yeung, Qiang Xu.

First multi-view video generation work for 3D detection!

International Conference on Learning Representations (ICLR), 2024.

[PDF][Project page][Wechat Post][Talk] [ECCV 2024 Workshop]
geodiffusion.png

GeoDiffusion: Text-Prompted Geometric Control for Object Detection Data Generation

Kai Chen*, Enze Xie*, Zhe Chen, Yibo Wang, Lanqing Hong, Zhenguo Li, Dit-Yan Yeung.

First geometric-controllable work for 2D detection!

International Conference on Learning Representations (ICLR), 2024.

[PDF][Project page]
AIGC Helpfulness - Object-level Self-supervised Learning
mixedae.png

Mixed Autoencoder for Self-supervised Visual Representation Learning

Kai Chen*, Zhili Liu*, Lanqing Hong, Hang Xu, Zhenguo Li, Dit-Yan Yeung.

IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR), 2023.

Workshop of Self-supervised Learning, Vision and Learning Seminar (VALSE), 2023 (spotlight).

[PDF][Wechat Post][Talk]
MultiSiam.png

MultiSiam: Self-supervised Multi-instance Siamese Representation Learning for Autonomous Driving

Kai Chen, Lanqing Hong, Hang Xu, Zhenguo Li, Dit-Yan Yeung.

IEEE/CVF International Conference on Computer Vision (ICCV), 2021.

[PDF] [Zhihu]
SODA10M.png

SODA10M: A Large-Scale 2D Self/Semi-Supervised Object Detection Dataset for Autonomous Driving.

Jianhua Han, Xiwen Liang, Hang Xu, Kai Chen, Lanqing Hong, Jiageng Mao, Chaoqiang Ye, Wei Zhang, Zhenguo Li, Xiaodan Liang, Chunjing Xu.

Datasets and Benchmarks Track, Neural Information Processing Systems (NeurIPS), 2021.

[PDF] [Website] [Talk] [ICCV 2021 Workshop]
Talks
  • [VALSE Webinar] Geometric-controllable Visual Generation: A Systemetic Solution. [Recording]
  • [AIDriver Online] Controllable Corner Case Generation for Autonomous Driving. [Recording]
  • [AI TIME Online] Gaining Wisdom from Setbacks: Aligning Large Language Models via Mistake Analysis. [Recording]
  • [TechBeat Online] Gaining Wisdom from Setbacks: Aligning Large Language Models via Mistake Analysis. [Recording]
  • [VALSE 2023@Wuxi] Mixed Autoencoder for Self-supervised Visual Representation Learning. [Recording]
  • [VALSE 2023@Wuxi] CODA: A Real-World Road Corner Case Dataset for Object Detection in Autonomous Driving. [Recording]
Experiences
Indiana University Bloomington
Indiana, U.S.A.
June 2019 - Sep. 2019
Visiting Scholar at Computer Vision lab, supervised by Prof. David Crandall
University of Manchester
Manchester, U.K.
Sep. 2018 - Jan. 2019
International exchange student, supervised by Dr. Tingting Mu
Academic Services

Program committee/Organizer:

  • The Workshop on Multimodal Perception and Comprehension of Corner Cases in Autonomous Driving at ECCV 2024.
  • The 2nd SSLAD Workshop at ECCV 2022.
  • The 1st SSLAD (Self-supervised Learning for Next-generation Industry-level Autonomous Driving) Workshop at ICCV 2021.
Reviewer:
  • Conference: ICLR 2025/2024/2023, NeurIPS 2024/2023/2022/2021, ECCV 2024/2022, ACCV 2024, IJCAI 2024/2023, CVPR 2024/2023/2022, ICCV 2023, ICRA 2022, AAAI 2022.
  • Journal: TPAMI, TCSVT, TIP and IEEE Access.
Selected Awards

HKUST Research Travel Grant

2023

HKUST Postgraduate Scholarship

2020

Outstanding Graduate of Shanghai [post]

2020

Scholarship for Outstanding Graduates of Fudan University

2020

Joel & Ruth Spira Scholarship

2019

Oversea Visiting Student Stipend of Fudan University

2019

National Scholarship

2018

Scholarship for Outstanding Undergraduates of Fudan University

2017
Interest

I love basketball and I'm also a big fan of Stepfen Curry, MVP point guard of Golden State Warriors, NBA. I'm a team member of my class's basketball team and often play Score / Power forward (SF/PF). In my spare time, I also play the role of a basketball game referee. Hope one day I can have a chance to see a home game of Warriors in Chase Center San Francisco!