Hybrid-Sensitivity-Weighted-Quantization (HSWQ)
High-fidelity FP8 quantization for diffusion models (SDXL). HSWQ uses sensitivity and importance analysis instead of naive uniform cast, and offers two modes: standard-compatible (V1) and high-performance scaled (V2).
Technical details: md/HSWQ_ Hybrid Sensitivity Weighted Quantization.md
Overview
| Feature | V1: Standard Compatible | V2: High Performance Scaled |
|---|---|---|
| Compatibility | Full (100%), any FP8 loader | Custom loader (HSWQLoader) required |
| File format | Standard FP8 (torch.float8_e4m3fn) |
Extended FP8 (weights + .scale metadata) |
| Image quality (SSIM) | ~0.95 (theoretical limit) | ~0.96+ (close to FP16) |
| Mechanism | Optimal clipping (smart clipping) | Full-range scaling (dynamic scaling) |
| Use case | Distribution, general users | In-house, max quality, server-side |
File size is reduced by about 50% vs FP16 while keeping best quality per use case.
Architecture
Dual Monitor System β During calibration, two metrics are collected:
- Sensitivity (output variance): layers that hurt image quality most if corrupted β top 25% kept in FP16.
- Importance (input mean absolute value): per-channel contribution β used as weights in the weighted histogram.
Rigorous FP8 Grid Simulation β Uses a physical grid (all 0β255 values cast to
torch.float8_e4m3fn) instead of theoretical formulas, so MSE matches real runtime.Weighted MSE Optimization β Finds parameters that minimize quantization error using the importance histogram.
Modes
- V1 (
scaled=False): No scaling; only the clipping threshold (amax) is optimized. Output is standard FP8 weights. Use when you need maximum compatibility. - V2 (
scaled=True): Weights are scaled to FP8 range, quantized, and inverse scaleSis stored in Safetensors (.scale). Use with HSWQLoader for best quality.
Recommended Parameters
- Samples: 256 (minimum for reliable stats; 128 is insufficient).
- Keep ratio: 0.25 (25%) β keeps critical layers in FP16; 0.10 has higher degradation risk.
- Steps: 20β25 β to include early denoising sensitivity.
Benchmark (Reference)
| Model | SSIM (Avg) | File size | Compatibility |
|---|---|---|---|
| Original FP16 | 1.0000 | 100% (6.5GB) | High |
| Naive FP8 | 0.81-0.93 | 50% | High |
| HSWQ V1 | 0.86β0.95 | 55% (FP16 mixed) | High |
| HSWQ V2 | 0.87β0.96 | 55% (FP16 mixed) | Low (custom loader) |
HSWQ V1 gives a clear gain over Naive FP8 with full compatibility; V2 targets maximum quality with a custom loader.
2. Setup
- VAE: Use standard SDXL VAE (place in
models/vae/)
π¦ Available Models
| Filename | Base Model | Version | License |
|---|---|---|---|
realvisxlV50_v50_r128_svdq_fp4.safetensors |
RealVisXL V5.0 | v50.0 | CreativeML Open RAIL++-M |
waiRealCN_v10_r128_svdq_fp4.safetensors |
wai-RealCN | v10.0 | CreativeML Open RAIL++-M |
bluepencilXL_v031_r128_svdq_fp4.safetensors |
BluePencil-XL | v0.3.1 | CreativeML Open RAIL++-M |
waiIllustriousSDXL_v160_r128_svdq_fp4.safetensors |
waiIllustriousSDXL | v1.6.0 | CreativeML Open RAIL++-M |
koronemixIllustrious_v70_r128_svdq_fp4.safetensors |
koronemix-illustrious | v70.0 | CreativeML Open RAIL++-M |
novaanimeXL_v15_r128_svdq_fp4.safetensors |
Nova Anime XL | v15.0 | CreativeML Open RAIL++-M |
waiREALISM_v10_hswq_r256_s25_v1.safetensors |
waiREALISM | v1.0 | CreativeML Open RAIL++-M |
π Credits & License
π Special Acknowledgement
We extend our deepest respect and gratitude to the Nunchaku Team for their groundbreaking work on SVDQ quantization and for sharing their models with the community. This collection relies heavily on their research and original implementation.
- Original Repository: nunchaku-tech/nunchaku-sdxl
Base Models
These models are derivatives of their respective creators. All credit for aesthetic tuning and model training belongs to the original creators.
- RealVisXL V5.0: Created by SG_161222.
- wai-RealCN: Created by wai.
- BluePencil-XL v0.3.1: Created by blue_pen.
- waiIllustriousSDXL: Created by wai.
- koronemix-illustrious: Created by korone.
- Nova Anime XL: Created by realdos.
Software & Integration
- ComfyUI Loaders: The Nunchaku SDXL DiT Loader and LoRA Loader were developed and are maintained by ussoewwin (GitHub).
- Quantization Engine: Models quantized using the Nunchaku framework by MIT HAN Lab.
Disclaimer: These models are provided for optimization and research purposes. Please adhere to the original licenses of the base models.