base_model:
- nothingiisreal/MN-12B-Starcannon-v3
- MarinaraSpaghetti/NemoMix-Unleashed-12B
library_name: transformers
tags:
- mergekit
- merge
- llama-cpp
- gguf-my-repo
license: cc-by-nc-4.0
Starcannon-Unleashed-12B-v1.0-GGUF
Static quants of VongolaChouko/Starcannon-Unleashed-12B-v1.0.
This model was converted to GGUF format from VongolaChouko/Starcannon-Unleashed-12B-v1.0 using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model.
I recommend using them with koboldcpp. You can find their latest release here: koboldcpp-1.76
Download a file (not the whole branch) from below:
Filename | Quant type | File Size | Split | Description |
---|---|---|---|---|
Starcannon-Unleashed-12B-v1.0-FP16.gguf | f16 | 24.50GB | false | Full F16 weights. |
Mistral-Nemo-Instruct-2407-Q8_0.gguf | Q8_0 | 13.02GB | false | Extremely high quality, generally unneeded but max available quant. |
Starcannon-Unleashed-12B-v1.0-Q6_K.gguf | Q6_K | 10.06GB | false | Very high quality, near perfect, recommended. |
Mistral-Nemo-Instruct-2407-Q5_K_L.gguf | Q5_K_L | 9.14GB | false | Uses Q8_0 for embed and output weights. High quality, recommended. |
Mistral-Nemo-Instruct-2407-Q5_K_M.gguf | Q5_K_M | 8.73GB | false | High quality, recommended. |
Mistral-Nemo-Instruct-2407-Q5_K_S.gguf | Q5_K_S | 8.52GB | false | High quality, recommended. |
Mistral-Nemo-Instruct-2407-Q4_K_L.gguf | Q4_K_L | 7.98GB | false | Uses Q8_0 for embed and output weights. Good quality, recommended. |
Mistral-Nemo-Instruct-2407-Q4_K_M.gguf | Q4_K_M | 7.48GB | false | Good quality, default size for must use cases, recommended. |
Mistral-Nemo-Instruct-2407-Q3_K_XL.gguf | Q3_K_XL | 7.15GB | false | Uses Q8_0 for embed and output weights. Lower quality but usable, good for low RAM availability. |
Mistral-Nemo-Instruct-2407-Q4_K_S.gguf | Q4_K_S | 7.12GB | false | Slightly lower quality with more space savings, recommended. |
Mistral-Nemo-Instruct-2407-Q4_0.gguf | Q4_0 | 7.09GB | false | Legacy format, generally not worth using over similarly sized formats |
Mistral-Nemo-Instruct-2407-Q4_0_8_8.gguf | Q4_0_8_8 | 7.07GB | false | Optimized for ARM and CPU inference, much faster than Q4_0 at similar quality. |
Mistral-Nemo-Instruct-2407-Q4_0_4_8.gguf | Q4_0_4_8 | 7.07GB | false | Optimized for ARM and CPU inference, much faster than Q4_0 at similar quality. |
Mistral-Nemo-Instruct-2407-Q4_0_4_4.gguf | Q4_0_4_4 | 7.07GB | false | Optimized for ARM and CPU inference, much faster than Q4_0 at similar quality. |
Mistral-Nemo-Instruct-2407-IQ4_XS.gguf | IQ4_XS | 6.74GB | false | Decent quality, smaller than Q4_K_S with similar performance, recommended. |
Mistral-Nemo-Instruct-2407-Q3_K_L.gguf | Q3_K_L | 6.56GB | false | Lower quality but usable, good for low RAM availability. |
Mistral-Nemo-Instruct-2407-Q3_K_M.gguf | Q3_K_M | 6.08GB | false | Low quality. |
Mistral-Nemo-Instruct-2407-IQ3_M.gguf | IQ3_M | 5.72GB | false | Medium-low quality, new method with decent performance comparable to Q3_K_M. |
Mistral-Nemo-Instruct-2407-Q3_K_S.gguf | Q3_K_S | 5.53GB | false | Low quality, not recommended. |
Mistral-Nemo-Instruct-2407-Q2_K_L.gguf | Q2_K_L | 5.45GB | false | Uses Q8_0 for embed and output weights. Very low quality but surprisingly usable. |
Mistral-Nemo-Instruct-2407-IQ3_XS.gguf | IQ3_XS | 5.31GB | false | Lower quality, new method with decent performance, slightly better than Q3_K_S. |
Mistral-Nemo-Instruct-2407-Q2_K.gguf | Q2_K | 4.79GB | false | Very low quality but surprisingly usable. |
Mistral-Nemo-Instruct-2407-IQ2_M.gguf | IQ2_M | 4.44GB | false | Relatively low quality, uses SOTA techniques to be surprisingly usable. |
Instruct
Both ChatML and Mistral should work fine. Personally, I tested this using ChatML. I found that I like the model's responses better when I use this format. Try to test it out and observe which one you like best. :D
Settings
I recommend using these setings: Starcannon-Unleashed-12B-v1.0-ST-Formatting-2024-10-29.json
IMPORTANT: Open Silly Tavern and use "Master Import", which can be found under "A" tab — Advanced Formatting. Replace the "INSERT WORLD HERE" placeholders with the world/universe in which your charcater belongs to. If not applicable, just remove that part.
Temperature 1.15 - 1.25 is good, but lower should also work well, as long as you also tweak the Min P and XTC to ensure the model won't choke. Play around with it to see what suits your taste.
This is a modified version of MarinaraSpaghetti's Mistral-Small-Correct.json, transformed into ChatML.
You can find the original version here: MarinaraSpaghetti/SillyTavern-Settings
To use with llama.cpp
Install llama.cpp through brew (works on Mac and Linux)
brew install llama.cpp
Invoke the llama.cpp server or the CLI.
CLI:
llama-cli --hf-repo VongolaChouko/Starcannon-Unleashed-12B-v1.0-Q6_K-GGUF --hf-file starcannon-unleashed-12b-v1.0-q6_k.gguf -p "The meaning to life and the universe is"
Server:
llama-server --hf-repo VongolaChouko/Starcannon-Unleashed-12B-v1.0-Q6_K-GGUF --hf-file starcannon-unleashed-12b-v1.0-q6_k.gguf -c 2048
Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well.
Step 1: Clone llama.cpp from GitHub.
git clone https://github.com/ggerganov/llama.cpp
Step 2: Move into the llama.cpp folder and build it with LLAMA_CURL=1
flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux).
cd llama.cpp && LLAMA_CURL=1 make
Step 3: Run inference through the main binary.
./llama-cli --hf-repo VongolaChouko/Starcannon-Unleashed-12B-v1.0-Q6_K-GGUF --hf-file starcannon-unleashed-12b-v1.0-q6_k.gguf -p "The meaning to life and the universe is"
or
./llama-server --hf-repo VongolaChouko/Starcannon-Unleashed-12B-v1.0-Q6_K-GGUF --hf-file starcannon-unleashed-12b-v1.0-q6_k.gguf -c 2048