TRCE: Towards Reliable Malicious Concept Erasure in Text-to-Image Diffusion Models

Setup

The environment we conduct experiments are as follows:

python: 3.10
torch: 2.1.2
CUDA Version: 12.4

Please run pip install -r requirement.txt to install dependency packages.

The erased model can be found :hugs:here. Currently, our implementation is based only on SD1.4. We will release the implementation of TRCE on newer model in the future.

RUN

You can find the pre-cached COCO embeddings :hugs:here. Please download the cache directory and place it in data/cache.

Run stage-1 TRCE

In the first stage, TRCE starts with a closed-form edit for the cross-attention layers, simply run:

# for erasing "sexual"
python run_trce_stage1.py config/stage1/stage1_sexual_default.yaml

# for erasing multiple malicious concepts
python run_trce_stage1.py config/stage1/stage1_unsafe_default.yaml

You can modify the base model path and the output directory for the first-stage fine-tuned model in the configuration files.

Run stage-2 TRCE

Before the second stage, you need to prepare the denosing trajectory samples for the fine-tuning:

python stage2_data_preparation.py

This script generates samples for both "sexual" and "multi-concept" fine-tuning, as well as unconditional samples for the regularization loss.

Then, you can run the stage-2 using the following scripts:

# for erasing "sexual"
python run_trce_stage2.py config/stage2/stage2_sexual_default.yaml

# for erasing multiple malicious concepts
python run_trce_stage2.py config/stage2/stage2_unsafe_default.yaml

Evaluation

The evaluation relies on the following repositories: NudeNet, Q16 Detector, Pytorch FID, and CLIP Score. Please install these repositories according to their instructions before proceeding with the evaluation.

Generate image using erased model

Firstly, use the following scripts with the specified UNet path and output path to generate images for different evaluation tasks.

# for evaluate "sexual" erasure
python gen_sexual.py <erased-model-dir> <output_path>

# for evaluate "multi concepts" erasure
python gen_unsafe.py <erased-model-dir> <output_path>

# for evaluate knowledge preservation on coco
python gen_coco.py <erased-model-dir> <output_path>

Then, you can follow the instructions in eval_nudenet_batch.ipynb, eval_unsafe.ipynb and eval_coco_batch.ipynb to evaluate and review the performance of the erasure.

If you encounter any issues while using this repository, please feel free to leave messages in issues or contact me at [email protected]. I will respond as soon as possible.

Citation

@article{chen2025reliable,
    title={TRCE: Towards Reliable Malicious Concept Erasure in Text-to-Image Diffusion Models}, 
    author={Ruidong, Chen and Honglin, Guo and Lanjun, Wang and Chenyu, Zhang and Weizh, Nie and An-An, Liu},
    journal={arXiv preprint arXiv:2503.07389},
    year={2025}
}

Acknowledgement

We built this repository based on the excellent work of previous projects: RECE, MACE, and Safree. Thank you to all who contributed.