GGUF Version
GGUF with Quants! Allowing you to run models using KoboldCPP and other AI Environments!
Quantizations:
Quant Type | Benefits | Cons |
---|---|---|
Q4_K_M | ✅ Smallest size (fastest inference) | ❌ Lowest accuracy compared to other quants |
✅ Requires the least VRAM/RAM | ❌ May struggle with complex reasoning | |
✅ Ideal for edge devices & low-resource setups | ❌ Can produce slightly degraded text quality | |
Q5_K_M | ✅ Better accuracy than Q4, while still compact | ❌ Slightly larger model size than Q4 |
✅ Good balance between speed and precision | ❌ Needs a bit more VRAM than Q4 | |
✅ Works well on mid-range GPUs | ❌ Still not as accurate as higher-bit models | |
Q8_0 | ✅ Highest accuracy (closest to full model) | ❌ Requires significantly more VRAM/RAM |
✅ Best for complex reasoning & detailed outputs | ❌ Slower inference compared to Q4 & Q5 | |
✅ Suitable for high-end GPUs & serious workloads | ❌ Larger file size (takes more storage) |
Model Details:
Read the Model details on huggingface Model Detail Here!
- Downloads last month
- 103
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
Model tree for N-Bot-Int/OpenElla3-Llama3.2B-GGUF
Base model
meta-llama/Llama-3.2-3B-Instruct
Adapter
N-Bot-Int/OpenRP3B-Llama3.2
Adapter
N-Bot-Int/OpenElla3-Llama3.2B