metadata

base_model:
  - nothingiisreal/MN-12B-Starcannon-v3
  - MarinaraSpaghetti/NemoMix-Unleashed-12B
library_name: transformers
tags:
  - mergekit
  - merge
  - llama-cpp
  - gguf-my-repo
license: cc-by-nc-4.0

Starcannon-Unleashed-12B-v1.0-GGUF

Static quants of VongolaChouko/Starcannon-Unleashed-12B-v1.0.

This model was converted to GGUF format from VongolaChouko/Starcannon-Unleashed-12B-v1.0 using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model.

I recommend using them with koboldcpp. You can find their latest release here: koboldcpp-1.76

Download a file (not the whole branch) from below:

Filename	Quant type	File Size	Split	Description
Starcannon-Unleashed-12B-v1.0-FP16.gguf	f16	24.50GB	false	Full F16 weights.
Mistral-Nemo-Instruct-2407-Q8_0.gguf	Q8_0	13.02GB	false	Extremely high quality, generally unneeded but max available quant.
Starcannon-Unleashed-12B-v1.0-Q6_K.gguf	Q6_K	10.06GB	false	Very high quality, near perfect, recommended.
Mistral-Nemo-Instruct-2407-Q5_K_L.gguf	Q5_K_L	9.14GB	false	Uses Q8_0 for embed and output weights. High quality, recommended.
Mistral-Nemo-Instruct-2407-Q5_K_M.gguf	Q5_K_M	8.73GB	false	High quality, recommended.
Mistral-Nemo-Instruct-2407-Q5_K_S.gguf	Q5_K_S	8.52GB	false	High quality, recommended.
Mistral-Nemo-Instruct-2407-Q4_K_L.gguf	Q4_K_L	7.98GB	false	Uses Q8_0 for embed and output weights. Good quality, recommended.
Mistral-Nemo-Instruct-2407-Q4_K_M.gguf	Q4_K_M	7.48GB	false	Good quality, default size for must use cases, recommended.
Mistral-Nemo-Instruct-2407-Q3_K_XL.gguf	Q3_K_XL	7.15GB	false	Uses Q8_0 for embed and output weights. Lower quality but usable, good for low RAM availability.
Mistral-Nemo-Instruct-2407-Q4_K_S.gguf	Q4_K_S	7.12GB	false	Slightly lower quality with more space savings, recommended.
Mistral-Nemo-Instruct-2407-Q4_0.gguf	Q4_0	7.09GB	false	Legacy format, generally not worth using over similarly sized formats
Mistral-Nemo-Instruct-2407-Q4_0_8_8.gguf	Q4_0_8_8	7.07GB	false	Optimized for ARM and CPU inference, much faster than Q4_0 at similar quality.
Mistral-Nemo-Instruct-2407-Q4_0_4_8.gguf	Q4_0_4_8	7.07GB	false	Optimized for ARM and CPU inference, much faster than Q4_0 at similar quality.
Mistral-Nemo-Instruct-2407-Q4_0_4_4.gguf	Q4_0_4_4	7.07GB	false	Optimized for ARM and CPU inference, much faster than Q4_0 at similar quality.
Mistral-Nemo-Instruct-2407-IQ4_XS.gguf	IQ4_XS	6.74GB	false	Decent quality, smaller than Q4_K_S with similar performance, recommended.
Mistral-Nemo-Instruct-2407-Q3_K_L.gguf	Q3_K_L	6.56GB	false	Lower quality but usable, good for low RAM availability.
Mistral-Nemo-Instruct-2407-Q3_K_M.gguf	Q3_K_M	6.08GB	false	Low quality.
Mistral-Nemo-Instruct-2407-IQ3_M.gguf	IQ3_M	5.72GB	false	Medium-low quality, new method with decent performance comparable to Q3_K_M.
Mistral-Nemo-Instruct-2407-Q3_K_S.gguf	Q3_K_S	5.53GB	false	Low quality, not recommended.
Mistral-Nemo-Instruct-2407-Q2_K_L.gguf	Q2_K_L	5.45GB	false	Uses Q8_0 for embed and output weights. Very low quality but surprisingly usable.
Mistral-Nemo-Instruct-2407-IQ3_XS.gguf	IQ3_XS	5.31GB	false	Lower quality, new method with decent performance, slightly better than Q3_K_S.
Mistral-Nemo-Instruct-2407-Q2_K.gguf	Q2_K	4.79GB	false	Very low quality but surprisingly usable.
Mistral-Nemo-Instruct-2407-IQ2_M.gguf	IQ2_M	4.44GB	false	Relatively low quality, uses SOTA techniques to be surprisingly usable.

Instruct

Both ChatML and Mistral should work fine. Personally, I tested this using ChatML. I found that I like the model's responses better when I use this format. Try to test it out and observe which one you like best. :D

Settings

I recommend using these setings: Starcannon-Unleashed-12B-v1.0-ST-Formatting-2024-10-29.json

IMPORTANT: Open Silly Tavern and use "Master Import", which can be found under "A" tab — Advanced Formatting. Replace the "INSERT WORLD HERE" placeholders with the world/universe in which your charcater belongs to. If not applicable, just remove that part.

Temperature 1.15 - 1.25 is good, but lower should also work well, as long as you also tweak the Min P and XTC to ensure the model won't choke. Play around with it to see what suits your taste.

This is a modified version of MarinaraSpaghetti's Mistral-Small-Correct.json, transformed into ChatML.

You can find the original version here: MarinaraSpaghetti/SillyTavern-Settings

To use with llama.cpp

Install llama.cpp through brew (works on Mac and Linux)

brew install llama.cpp

Invoke the llama.cpp server or the CLI.

CLI:

llama-cli --hf-repo VongolaChouko/Starcannon-Unleashed-12B-v1.0-Q6_K-GGUF --hf-file starcannon-unleashed-12b-v1.0-q6_k.gguf -p "The meaning to life and the universe is"

Server:

llama-server --hf-repo VongolaChouko/Starcannon-Unleashed-12B-v1.0-Q6_K-GGUF --hf-file starcannon-unleashed-12b-v1.0-q6_k.gguf -c 2048

Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well.

Step 1: Clone llama.cpp from GitHub.

git clone https://github.com/ggerganov/llama.cpp

Step 2: Move into the llama.cpp folder and build it with LLAMA_CURL=1 flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux).

cd llama.cpp && LLAMA_CURL=1 make

Step 3: Run inference through the main binary.

./llama-cli --hf-repo VongolaChouko/Starcannon-Unleashed-12B-v1.0-Q6_K-GGUF --hf-file starcannon-unleashed-12b-v1.0-q6_k.gguf -p "The meaning to life and the universe is"

./llama-server --hf-repo VongolaChouko/Starcannon-Unleashed-12B-v1.0-Q6_K-GGUF --hf-file starcannon-unleashed-12b-v1.0-q6_k.gguf -c 2048