view post Post 2051 llama.cpp is 26.8% faster than ollama. I have upgraded both, and using the same settings, I am running the same DeepSeek R1 Distill 1.5B on the same hardware. It's an Apples to Apples comparison. Total duration: llama.cpp 6.85 sec <- 26.8% fasterollama 8.69 secBreakdown by phase:Model loadingllama.cpp 241 ms <- 2x fasterollama 553 msPrompt processingllama.cpp 416.04 tokens/s with an eval time 45.67 ms <- 10x fasterollama 42.17 tokens/s with an eval time of 498 msToken generationllama.cpp 137.79 tokens/s with an eval time 6.62 sec <- 13% fasterollama 122.07 tokens/s with an eval time 7.64 secllama.cpp is LLM inference in C/C++; ollama adds abstraction layers and marketing. Make sure you own your AI. AI in the cloud is not aligned with you; it's aligned with the company that owns it. See translation 7 replies ยท ๐ 12 12 ๐ 7 7 + Reply
GGUF LoRA adapters Collection Adapters extracted from fine tuned models, using mergekit-extract-lora โข 16 items โข Updated 7 days ago โข 4
ngxson/DeepSeek-R1-Distill-Qwen-7B-abliterated-GGUF Text Generation โข Updated 7 days ago โข 1.99k โข 3
Extracted LoRA (mergekit) Collection PEFT-compatible LoRA adapters produced by mergekit-extract-lora โข 17 items โข Updated 7 days ago โข 3