free-solar-0.3-exl2

Original model: free-solar-slerp-v0.3
Model creator: freewheelin

Quants

4bpw h6 (main)
4.25bpw h6
4.65bpw h6
5bpw h6
6bpw h6
8bpw h8

Quantization notes

Made with Exllamav2 0.0.15 with the default dataset.
This model has unusually long loading times, normal 11B models take about 30s on my PC but this one loads in 130s.
I have no clue why it has such long loading times, maybe because it was originally FP32 instead of usual FP16.
But overall VRAM usage and generation speed seems to be rather normal.

This seems to be primarily a Korean language model.
I didn't realize it at first when I tried it since the language wasn't explictly listed.
I'm unable to evaluate it in its main area but it seems to be usable in English and to some degree in other languages.
When using it in English, sometimes it seems to randomly switch topics or starts writing in Korean as if it occasionally forgets to write the stopping token.
But it has an interesting writing style in English and overall seems to be quite reasonable, so I decided to make full quants.
And I was curious to try to quantize a FP32 model, RAM requirements were higher but overall process went smooth without any issues.

How to run

This quantization method uses GPU and requires Exllamav2 loader which can be found in following applications:

Text Generation Webui

KoboldAI

ExUI

Original model card

free-solar-0.3

This is a merge of pre-trained language models created using mergekit.

Merge Details

Merge Method

This model was merged using the SLERP merge method.

Models Merged

The following models were included in the merge:

cgus
/

free-solar-slerp-v0.3-exl2