docker pull ghcr.io/ggerganov/llama.cpp:server

Assuming mistral-7B-instruct-v0.2-q8.gguf file is downloaded to /path/to/models directory on local machine, run the container accesing the model with:

docker run -v /path/to/models:/models -p 8000:8000 ghcr.io/ggerganov/llama.cpp:server -m /models/istral-7B-instruct-v0.2-q8.gguf --port 8000 --host 0.0.0.0 -n 512
  • Test the deployment accessing the model with the browser at http://localhost:8000
  • llama.cpp server also provides OpenAI compatible API
  • Deployment on CUDA GPU:
docker pull ghcr.io/ggerganov/llama.cpp:server-cuda
docker run --gpus all -v /path/to/models:/models -p 8000:8000 ghcr.io/ggerganov/llama.cpp:server-cuda -m /models/mistral-7B-instruct-v0.2-q8.gguf --port 8000 --host 0.0.0.0 -n 512 --n-gpu-layers 50
Downloads last month
5
GGUF
Model size
7.24B params
Architecture
llama
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.