Issue with llama.cpp

by wsbagnsv1 - opened 6 days ago

6 days ago

Ive managed to convert the model to gguf + mmproj gguf with a small teak to the conversion script, but when i load the model it cant detect images in its context. There is no image not supported error though?

warshanks

4 days ago

Just checking in to say I would love llama.cpp support/a GGUF for this 🙏

wsbagnsv1

4 days ago

i mean i can upload some quants if you want, but llama.cpp support is another beast so lobby whoever you know to get that done 😅

wsbagnsv1

4 days ago

I currently try if i can upload quants via colab, which would make everything a lot easier, since my own upload sucks ass, to upload more quants for the 38b model, since that one is downloaded a LOT because nobody else has the mmproj lol

warshanks

4 days ago

So you were able to convert it to a GGUF but the actual vision still isn't working in llama.cpp, correct? And yeah if you upload a GGUF of this model I'd love to tinker with it!

This model as a concept is basically perfect for my use case but I'm afraid it'll never actually get supported 😭

wsbagnsv1

4 days ago

yeah its weird as i said, it doesnt throw any error, it just loads but cant see the actual image for whatever reason, once im done with 38b ill upload at least the mmproj and some quants for 20b

wsbagnsv1

4 days ago

okay the colab thing seems to work, so i can upload the 20b stuff from my own pc in parallel (;

wsbagnsv1

3 days ago

•

edited 3 days ago

https://huggingface.co/QuantStack/InternVL3_5-GPT-OSS-20B-A4B-Preview-gguf

warshanks

3 days ago

Awesome, thanks! I'm gonna play around with it and see if I have any luck at all

wsbagnsv1

3 days ago

ill go and add more quants if you are successful (;

warshanks

3 days ago

It worked for me! 🎉

I'm using the latest commit from llama.cpp and Open WebUI v0.6.26. This is the llama-server command I used:

./build/bin/llama-server -hf QuantStack/InternVL3_5-GPT-OSS-20B-A4B-Preview-gguf:Q4_0 \
                                                -ngl 99 \
                                                -np 5 \
                                                --port 8010 \
                                                --host 0.0.0.0 \
                                                -fa \
                                                --no-mmap \
                                                -a "[alias]" \
                                                -c 131072 \
                                                --main-gpu 0 \
                                                --jinja \
                                                --temp 1.0 \
                                                --top-p 1.0 \
                                                --top-k 0 \
                                                --no-webui

Could you share the conversion script you used?