MMLU Pro benchmark for GGUFs (1 shot) "Not all quantized model perform good", serving framework ollama uses NVIDIA gpu, llama.cpp uses CPU with AVX & AMX unsloth/GLM-4.5-Air-GGUF Text Generation • 110B • Updated 28 days ago • 126k • 84 unsloth/Qwen3-30B-A3B-Thinking-2507-GGUF 31B • Updated Jul 31 • 54.9k • 85 unsloth/DeepSeek-V3-0324-GGUF-UD Text Generation • 671B • Updated Apr 28 • 4.48k • 17 unsloth/cogito-v2-preview-llama-109B-MoE-GGUF 108B • Updated Jul 31 • 6.34k • 8
OCR Models numind/NuMarkdown-8B-Thinking Image-to-Text • 8B • Updated 13 days ago • 9.32k • 195 rednote-hilab/dots.ocr Image-Text-to-Text • 3B • Updated 15 days ago • 194k • 881 microsoft/kosmos-2.5-chat Image-Text-to-Text • 1B • Updated 5 days ago • 593 • 21
MMLU Pro benchmark for GGUFs (1 shot) "Not all quantized model perform good", serving framework ollama uses NVIDIA gpu, llama.cpp uses CPU with AVX & AMX unsloth/GLM-4.5-Air-GGUF Text Generation • 110B • Updated 28 days ago • 126k • 84 unsloth/Qwen3-30B-A3B-Thinking-2507-GGUF 31B • Updated Jul 31 • 54.9k • 85 unsloth/DeepSeek-V3-0324-GGUF-UD Text Generation • 671B • Updated Apr 28 • 4.48k • 17 unsloth/cogito-v2-preview-llama-109B-MoE-GGUF 108B • Updated Jul 31 • 6.34k • 8
OCR Models numind/NuMarkdown-8B-Thinking Image-to-Text • 8B • Updated 13 days ago • 9.32k • 195 rednote-hilab/dots.ocr Image-Text-to-Text • 3B • Updated 15 days ago • 194k • 881 microsoft/kosmos-2.5-chat Image-Text-to-Text • 1B • Updated 5 days ago • 593 • 21