DFloat11
/

stable-diffusion-3.5-large-DF11

lossless compression

70% size, 100% accuracy

Model card Files Files and versions

stable-diffusion-3.5-large-DF11 / README.md

LeanQuant's picture

Add files using upload-large-folder tool

5114965 verified 4 months ago

|

history blame contribute delete

3.19 kB

	---
	base_model:
	- stabilityai/stable-diffusion-3.5-large
	base_model_relation: quantized
	pipeline_tag: text-to-image
	tags:
	- dfloat11
	- df11
	- lossless compression
	- 70% size, 100% accuracy
	---

	## DFloat11 Compressed Model: `stabilityai/stable-diffusion-3.5-large`

	This is a losslessly compressed version of [`stabilityai/stable-diffusion-3.5-large`](https://huggingface.co/stabilityai/stable-diffusion-3.5-large) using our custom DFloat11 format.

	### 💡 Key Benefits

	* ✅ Bit-for-bit identical outputs to the original BFloat16 model
	* 📉 \~30% reduction in model size (from 16GB → 11.3GB)
	* 🧠 Lower memory requirements: now runs on 16GB GPUs
	* ⚡ Minimal performance overhead: barely any slower than the full model

	DFloat11 compresses the model weights while preserving full numerical precision. This allows you to run `stabilityai/stable-diffusion-3.5-large` on more accessible hardware, with no compromise in output quality.

	### 🔍 How It Works

	DFloat11 compresses model weights using Huffman coding of BFloat16 exponent bits, combined with hardware-aware algorithmic designs that enable efficient on-the-fly decompression directly on the GPU. During inference, the weights remain compressed in GPU memory and are decompressed just before matrix multiplications, then immediately discarded after use to minimize memory footprint.

	Advantages:
	* Fully GPU-based: no CPU decompression or host-device data transfer.
	* DFloat11 is much faster than CPU-offloading approaches, enabling practical deployment in memory-constrained environments.
	* The compression is fully lossless, guaranteeing that the model’s outputs are bit-for-bit identical to those of the original model.

	### 🔧 How to Use

	1. Install or upgrade the DFloat11 pip package (installs the CUDA kernel automatically; requires a CUDA-compatible GPU and PyTorch installed):

	```bash
	pip install -U dfloat11[cuda12]
	# or if you have CUDA version 11:
	# pip install -U dfloat11[cuda11]
	```

	2. Install or upgrade the diffusers package.

	```bash
	pip install -U diffusers
	```

	3. To use the DFloat11 model, run the following example code in Python:
	```python
	import torch
	from diffusers import StableDiffusion3Pipeline
	from dfloat11 import DFloat11Model

	pipe = StableDiffusion3Pipeline.from_pretrained("stabilityai/stable-diffusion-3.5-large", torch_dtype=torch.bfloat16)
	pipe.enable_model_cpu_offload()

	DFloat11Model.from_pretrained('DFloat11/stable-diffusion-3.5-large-DF11', device='cpu', bfloat16_model=pipe.transformer)

	image = pipe(
	"A capybara holding a sign that reads Hello World",
	num_inference_steps=28,
	guidance_scale=3.5,
	).images[0]
	image.save("capybara.png")
	```

	### 📄 Learn More

	* Paper: [70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float](https://arxiv.org/abs/2504.11651)
	* GitHub: [https://github.com/LeanModels/DFloat11](https://github.com/LeanModels/DFloat11)
	* HuggingFace: [https://huggingface.co/DFloat11](https://huggingface.co/DFloat11)