GeoDiffusion: Text-Prompted Geometric Control for Object Detection Data Generation

1Hong Kong University of Science and Technology, 2Huawei Noah's Ark Lab,
3Nanjing University, 4Tsinghua University
(*Equal contribution. Corresponding author. )

(Left) GeoDiffusion supports various geometric condtions (bboxes & camera views) with a unified architecture. (Right) GeoDiffusion-generated images can be beneficial for object detector training.

Abstract

Diffusion models have attracted significant attention due to remarkable create contention ability. However, the usage of diffusion models to generate high-quality object detection data remains an underexplored area, where not only the image-level perceptual quality but also geometric conditions such as bounding boxes and camera views are essential.

We propose GeoDiffusion, a simple framework that flexibly translates various geometric conditions into text prompts and empower the pre-trained text-to-image (T2I) diffusion models for high-quality detection data generation. Unlike previous methods, our GeoDiffusion can encode not only bounding boxes but also extra geometric conditions such as camera views in self-driving scenes.

This is the very first work to adopt diffusion models for layout-to-image generation with geometric conditions and show L2I-generated images can be beneficial for improving object detectors.

3D Geometric Controls

Camera View Control

Domain Adapation

BibTeX

@article{chen2023integrating,
  title={Integrating Geometric Control into Text-to-Image Diffusion Models for High-Quality Detection Data Generation via Text Prompt},
  author={Chen, Kai and Xie, Enze and Chen, Zhe and Hong, Lanqing and Li, Zhenguo and Yeung, Dit-Yan},
  journal={arXiv preprint arXiv:2306.04607},
  year={2023}
}