πŸ’Ž Gemma 3 4B IT Abliterated

image/png

Gemma 3 12B Abliterated β€’ Gemma 3 27B Abliterated

This is an uncensored version of google/gemma-3-4b-it created with a new abliteration technique. See this article to know more about abliteration.

I was playing with model weights and noticed that Gemma 3 was much more resilient to abliteration than other models like Qwen 2.5. I experimented with a few recipes to remove refusals while preserving most of the model capabilities.

Note that this is fairly experimental, so it might not turn out as well as expected. I saw some garbled text from time to time (e.g., "It' my" instead of "It's my").

I recommend using these generation parameters: temperature=1.0, top_k=64, top_p=0.95.

⚑️ Quantization

βœ‚οΈ Layerwise abliteration

image/png

In the original technique, a refusal direction is computed by comparing the residual streams between target (harmful) and baseline (harmless) samples.

Here, the model was abliterated by computing a refusal direction based on hidden states (inspired by Sumandora's repo) for most layers (layer 7 to 29), independently. This is combined with a refusal weight that follows a symmetric pattern from 0.05 to a peak of 0.55.

This created a very high acceptance rate (>90%) and still produced coherent outputs.

Downloads last month
152
Safetensors
Model size
4.3B params
Tensor type
BF16
Β·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for mlabonne/gemma-3-4b-it-abliterated

Finetuned
(27)
this model
Merges
1 model
Quantizations
4 models

Collection including mlabonne/gemma-3-4b-it-abliterated