toddric-3b-merged-v3-bnb4

Type: Qwen2.5-3B-Instruct, bnb-4bit (NF4, double-quant, bf16 compute)
What: 4-bit export of toddric-3b-merged-v3 for lower VRAM


TL;DR

  • Same model as …-merged-v3, packaged in bitsandbytes 4-bit.
  • Lower VRAM footprint (runs comfortably on 8 GB).
  • Slightly slower than bf16 on the same GPU (trade VRAM for speed).
  • Requires bitsandbytes at runtime.

How to load (Transformers + bitsandbytes)

from transformers import AutoTokenizer, AutoModelForCausalLM

mid = "toddie314/toddric-3b-merged-v3-bnb4"
tok = AutoTokenizer.from_pretrained(mid, use_fast=True)
tok.padding_side = "left"
tok.pad_token = tok.pad_token or tok.eos_token

# Quantization settings are saved in quantization_config.json
model = AutoModelForCausalLM.from_pretrained(
    mid,
    device_map={"": 0},
    attn_implementation="eager",
    low_cpu_mem_usage=True,
)
model.config.use_cache = True
model.generation_config.pad_token_id = tok.pad_token_id

If your stack doesn’t auto-install bitsandbytes, pip install bitsandbytes.


Evaluation (acceptance pack)

57-prompt persona acceptance suite (content constraints + speed).

  • Result (bnb-4bit, RTX 4060 8 GB): 100% pass; median ~9.5 tok/s (min ~9.0, max ~9.8) with attn="eager".

Expect slightly lower throughput than bf16 on the same card; benefit is memory.


Notes / Gotchas

  • Use attn_implementation="eager" on 8–12 GB GPUs for predictable speed.
  • If you see warnings about unused quant keys, ensure quantization_config.json matches bitsandbytes NF4.
  • Greedy defaults provided via generation_config.json. Enable sampling for creative tasks.

Provenance

This is a 4-bit export of toddie314/toddric-3b-merged-v3. See that card for training details and data notes.

Downloads last month
12
Safetensors
Model size
1.74B params
Tensor type
F32
·
F16
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for toddie314/toddric-3b-merged-v3-bnb4

Base model

Qwen/Qwen2.5-3B
Quantized
(147)
this model

Evaluation results