Geom-Erasing: Implicit Concept Removal of Diffusion Models

¹Hong Kong University of Science and Technology, ²Huawei Noah's Ark Lab, ³National University of Singapore

(^*Equal Contribution. ^†Corresponding authors. )

🔥 Geometric-controllable concept eraser for diffusion models!

Abstract

Text-to-image (T2I) diffusion models often inadvertently generate unwanted concepts such as watermarks and unsafe images. These concepts, termed as the "implicit concepts", could be unintentionally learned during training and then be generated uncontrollably during inference. Existing removal methods still struggle to eliminate implicit concepts primarily due to their dependency on the model's ability to recognize concepts it actually can not discern.

To address this, we utilize the intrinsic geometric characteristics of implicit concepts and present Geom-Erasing, a novel concept removal method based on the geometric-driven control. Specifically, once an implicit concept is identified, we integrate the existence and geometric information of concept into the text prompts with the help of an accessible classifier or detector model. Subsequently, the model is optimized to identify and disentangle this information, which is then adopted as negative prompts during generation.

Moreover, we introduce the Implicit Concept Dataset (ICD), a novel image-text dataset imbued with three typical implicit concepts (i.e., QR codes, watermarks, and text), reflecting real-life situations where implicit concepts are easily injected. Geom-Erasing effectively mitigates the generation of implicit concepts, achieving the state-of-the-art results on the Inappropriate Image Prompts (I2P) and our challenging Implicit Concept Dataset (ICD) benchmarks.

Implicit Concept Problem in Stable Diffusion

Stable Diffusion v1-5 surprisingly generates images with watermarks and unsafe content even though these implicit concepts are not mentioned in the text prompt. We define implicit concepts (IC) as concepts that are not explicitly specified in the text prompts but are still generated by the DMs.

Architecture of Geom-Erasing

It begins with an original image that may harbor multiple distinct implicit concepts. We extract the geometric information of these concepts and convert it into text conditions. Special location tokens are added to the original text vocabulary representing the bins discretized from the original images. Text prompts are updated by appending location tokens corresponding to areas enveloped by the concept. Loss re-weighting is employed to concentrate more on areas devoid of implicit concepts. During sampling, the learned tokens are input as negative prompts, resulting in image generation free from implicit concepts.

Qualitative Comparison

(Left) Erasing Implicit Concepts from SD. We successfully remove watermark and toxicity concepts from generated images while retaining other contents. (Right) Erasing implicit concept in ICD. The first group of images are fine-tuned on ICD-QR. The middle and the bottom are fine-tuned on ICD-watermark and ICD-Text, respectively.

BibTeX