gguf-org
/

api

Model card Files Files and versions

self-hosted api

run it with gguf-connector; activate the backend in console/terminal by

ggc w8

choose your model* file
GGUF available. Select which one to use:
1. sd3.5-2b-lite-iq4_nl.gguf [1.74GB]
2. sd3.5-2b-lite-mxfp4_moe.gguf [2.86GB]
Enter your choice (1 to 2): _
*accept sd3.5 2b model gguf recently, this will give you the fastest experience for even low tier gpu; frontend https://test.gguf.org or localhost (see decentralized frontend section below)

or opt fastapi lumina connector

ggc w7

choose your model* file
GGUF available. Select which one to use:
1. lumina2-q4_0.gguf [1.47GB]
2. lumina2-q8_0.gguf [2.77GB]
Enter your choice (1 to 2): _
*as lumina is no lite version recently, might need to increase the step to around 25 for better output
or opt fastapi flux connector

ggc w6

choose your model* file
GGUF available. Select which one to use:
1. flux-dev-lite-q2_k.gguf [4.08GB]
2. flux-krea-lite-q2_k.gguf [4.08GB]
Enter your choice (1 to 2): _
*accept any flux model gguf, lite is recommended for saving loading time

flexible frontend choice (see below)

decentralized frontend

option 1: navigate to https://test.gguf.org

option 2: localhost; keep the backend running and open a new terminal session then execute

ggc b

Prompt
a cat in a hat

Prompt
a raccoon in a hat

Prompt
a raccoon in a hat

Prompt
a dog walking in a cyber city with joy

Prompt
a dog walking in a cyber city with joy

Prompt
a dog walking in a cyber city with joy

self-hosted api (edit)

run it with gguf-connector; activate the backend in console/terminal by

ggc e8

choose your model file
GGUF available. Select which one to use:
1. flux-kontext-lite-q2_k.gguf [4.08GB]
Enter your choice (1 to 1): _

decentralized frontend - opt `Edit` from pulldown menu (stage 1: exclusive for 🐷 holder trial recently)

option 1: navigate to https://gguf.org

option 2: localhost; keep the backend running and open a new terminal session then execute

ggc a

self-hosted api (plus)

run it with gguf-connector; activate the backend in console/terminal by

ggc e9

choose your model file
Safetensors available. Select which one to use:
1. sketch-s9-20b-fp4.safetensors (for blackwell card 11.9GB)
2. sketch-s9-20b-int4.safetensors (for non-blackwell card 11.5GB)
Enter your choice (1 to 2): _

decentralized frontend - opt `Plus` from pulldown menu (stage 1: exclusive for 🐷 holder trial recently)

option 1: navigate to https://gguf.org

option 2: localhost; keep the backend running and open a new terminal session then execute

ggc a

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support