The Flux model with NF4 transformer and T5 encoder.

Usage

pip install bitsandbytes

from diffusers import FluxPipeline
import torch
pipeline = FluxPipeline.from_pretrained("eramth/flux-4bit",torch_dtype=torch.float16).to("cuda")
# This allows you to generate higher resolution images without much extra VRAM usage.
pipeline.vae.enable_tiling()
image = pipeline(prompt="a cute cat",num_inference_steps=25,guidance_scale=3.5).images[0]
image

You can create this quantization model yourself by

from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig
from transformers import BitsAndBytesConfig as TransformersBitsAndBytesConfig
from diffusers import FluxPipeline,FluxTransformer2DModel
from transformers import T5EncoderModel
import torch

token = ""
repo_id = ""

quant_config = TransformersBitsAndBytesConfig(load_in_4bit=True,bnb_4bit_compute_dtype=torch.float16,bnb_4bit_quant_type="nf4")

text_encoder_2_4bit = T5EncoderModel.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    subfolder="text_encoder_2",
    quantization_config=quant_config,
    torch_dtype=torch.float16,
    token=token
)

quant_config = DiffusersBitsAndBytesConfig(load_in_4bit=True,bnb_4bit_compute_dtype=torch.float16,bnb_4bit_quant_type="nf4")

transformer_4bit = FluxTransformer2DModel.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    subfolder="transformer",
    quantization_config=quant_config,
    torch_dtype=torch.float16,
    token=token
)

pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    transformer=transformer_4bit,
    text_encoder_2=text_encoder_2_4bit,
    torch_dtype=torch.float16,
    token=token
)

pipe.push_to_hub(repo_id,token=token)

eramth
/

flux-4bit

Usage

You can create this quantization model yourself by

Model tree for eramth/flux-4bit