Issue with llama.cpp

#3
by wsbagnsv1 - opened

Ive managed to convert the model to gguf + mmproj gguf with a small teak to the conversion script, but when i load the model it cant detect images in its context. There is no image not supported error though?

Just checking in to say I would love llama.cpp support/a GGUF for this 🙏

i mean i can upload some quants if you want, but llama.cpp support is another beast so lobby whoever you know to get that done 😅

I currently try if i can upload quants via colab, which would make everything a lot easier, since my own upload sucks ass, to upload more quants for the 38b model, since that one is downloaded a LOT because nobody else has the mmproj lol

So you were able to convert it to a GGUF but the actual vision still isn't working in llama.cpp, correct? And yeah if you upload a GGUF of this model I'd love to tinker with it!

This model as a concept is basically perfect for my use case but I'm afraid it'll never actually get supported 😭

yeah its weird as i said, it doesnt throw any error, it just loads but cant see the actual image for whatever reason, once im done with 38b ill upload at least the mmproj and some quants for 20b

okay the colab thing seems to work, so i can upload the 20b stuff from my own pc in parallel (;

Awesome, thanks! I'm gonna play around with it and see if I have any luck at all

ill go and add more quants if you are successful (;

It worked for me! 🎉

Screenshot 2025-08-29 at 12.25.07 PM.png

I'm using the latest commit from llama.cpp and Open WebUI v0.6.26. This is the llama-server command I used:

./build/bin/llama-server -hf QuantStack/InternVL3_5-GPT-OSS-20B-A4B-Preview-gguf:Q4_0 \
                                                -ngl 99 \
                                                -np 5 \
                                                --port 8010 \
                                                --host 0.0.0.0 \
                                                -fa \
                                                --no-mmap \
                                                -a "[alias]" \
                                                -c 131072 \
                                                --main-gpu 0 \
                                                --jinja \
                                                --temp 1.0 \
                                                --top-p 1.0 \
                                                --top-k 0 \
                                                --no-webui

Could you share the conversion script you used?

I'd love more quants! 🎉

so they prob already fixed the issue!

Nice what quant you want specifically? Ill add the others later on

Maybe something like a Q6_K_L?

And unrelated, but I would absolutely love a merge of the Intern vision model with huihui-ai/Huihui-gpt-oss-20b-BF16-abliterated.

im not into merging 😅
if you find someone who does it i can do the quants though

Sign up or log in to comment