bartowski
/

google_gemma-3-12b-it-GGUF

Image-Text-to-Text

Inference Endpoints

Model card Files Files and versions Community

bartowski commited on about 22 hours ago

Commit

6a1e6c9

·

verified ·

1 Parent(s): 6a465ad

Update README.md

Files changed (1) hide show

README.md +12 -0

README.md CHANGED Viewed

@@ -22,6 +22,16 @@ Run them in [LM Studio](https://lmstudio.ai/)
 Run them directly with [llama.cpp](https://github.com/ggerganov/llama.cpp), or any other llama.cpp based project
 ## Prompt format
 ```
@@ -38,6 +48,8 @@ Run them directly with [llama.cpp](https://github.com/ggerganov/llama.cpp), or a
 | Filename | Quant type | File Size | Split | Description |
 | -------- | ---------- | --------- | ----- | ----------- |
 | [gemma-3-12b-it-bf16.gguf](https://huggingface.co/bartowski/google_gemma-3-12b-it-GGUF/blob/main/google_gemma-3-12b-it-bf16.gguf) | bf16 | 23.54GB | false | Full BF16 weights. |
 | [gemma-3-12b-it-Q8_0.gguf](https://huggingface.co/bartowski/google_gemma-3-12b-it-GGUF/blob/main/google_gemma-3-12b-it-Q8_0.gguf) | Q8_0 | 12.51GB | false | Extremely high quality, generally unneeded but max available quant. |
 | [gemma-3-12b-it-Q6_K_L.gguf](https://huggingface.co/bartowski/google_gemma-3-12b-it-GGUF/blob/main/google_gemma-3-12b-it-Q6_K_L.gguf) | Q6_K_L | 9.90GB | false | Uses Q8_0 for embed and output weights. Very high quality, near perfect, *recommended*. |

 Run them directly with [llama.cpp](https://github.com/ggerganov/llama.cpp), or any other llama.cpp based project
+## Vision
+This model has vision capabilities, more details here: https://github.com/ggml-org/llama.cpp/pull/12344
+After building with Gemma 3 clip support, run the following command:
+```
+./build/bin/llama-gemma3-cli -m google_gemma-3-12b-it-Q8_0.gguf --mmproj mmproj-google_gemma-3-12b-it-f16.gguf
+```
 ## Prompt format
 ```
 | Filename | Quant type | File Size | Split | Description |
 | -------- | ---------- | --------- | ----- | ----------- |
+| [mmproj-gemma-3-12b-it-f32.gguf](https://huggingface.co/bartowski/google_gemma-3-12b-it-GGUF/blob/main/mmproj-google_gemma-3-12b-it-f32.gguf) | f32 | 1.68GB | false | F32 format MMPROJ file, required for vision. |
+| [mmproj-gemma-3-12b-it-f16.gguf](https://huggingface.co/bartowski/google_gemma-3-12b-it-GGUF/blob/main/mmproj-google_gemma-3-12b-it-f16.gguf) | f16 | 851MB | false | F16 format MMPROJ file, required for vision. |
 | [gemma-3-12b-it-bf16.gguf](https://huggingface.co/bartowski/google_gemma-3-12b-it-GGUF/blob/main/google_gemma-3-12b-it-bf16.gguf) | bf16 | 23.54GB | false | Full BF16 weights. |
 | [gemma-3-12b-it-Q8_0.gguf](https://huggingface.co/bartowski/google_gemma-3-12b-it-GGUF/blob/main/google_gemma-3-12b-it-Q8_0.gguf) | Q8_0 | 12.51GB | false | Extremely high quality, generally unneeded but max available quant. |
 | [gemma-3-12b-it-Q6_K_L.gguf](https://huggingface.co/bartowski/google_gemma-3-12b-it-GGUF/blob/main/google_gemma-3-12b-it-Q6_K_L.gguf) | Q6_K_L | 9.90GB | false | Uses Q8_0 for embed and output weights. Very high quality, near perfect, *recommended*. |