Melvin56
/

kanana-nano-2.1b-instruct-abliterated-GGUF

Text Generation

Inference Endpoints

Model card Files Files and versions Community

Melvin56 commited on 8 days ago

Commit

268a5be

·

verified ·

1 Parent(s): f721fef

Update README.md

Files changed (1) hide show

README.md +30 -20

README.md CHANGED Viewed

@@ -8,33 +8,43 @@ pipeline_tag: text-generation
 developers: Kanana LLM
 training_regime: bf16 mixed precision
 base_model:
-- kakaocorp/kanana-nano-2.1b-instruct
 tags:
 - abliterated
 - uncensored
 ---
-# huihui-ai/kanana-nano-2.1b-instruct-abliterated
-This is an uncensored version of [kakaocorp/kanana-nano-2.1b-instruct](https://huggingface.co/kakaocorp/kanana-nano-2.1b-instruct) created with abliteration (see [remove-refusals-with-transformers](https://github.com/Sumandora/remove-refusals-with-transformers) to know more about it).
-This is a crude, proof-of-concept implementation to remove refusals from an LLM model without using TransformerLens.
-## Use with ollama
-You can use [huihui_ai/kanana-nano-abliterated](https://ollama.com/huihui_ai/kanana-nano-abliterated) directly
-```
-ollama run huihui_ai/kanana-nano-abliterated
-```
-### Donation
-If you like it, please click 'like' and follow us for more updates.
-You can follow [x.com/support_huihui](https://x.com/support_huihui) to get the latest model information from huihui.ai.
-##### Your donation helps us continue our further development and improvement, a cup of coffee can do it.
-- bitcoin:
-```
-  bc1qqnkhuchxw0zqjh2ku3lu4hq45hc6gy84uk70ge
 ```

 developers: Kanana LLM
 training_regime: bf16 mixed precision
 base_model:
+- huihui-ai/kanana-nano-2.1b-instruct-abliterated
 tags:
 - abliterated
 - uncensored
 ---
+# Melvin56/kanana-nano-2.1b-instruct-abliterated-GGUF
+Original Model : [huihui-ai/kanana-nano-2.1b-instruct-abliterated](https://huggingface.co/huihui-ai/kanana-nano-2.1b-instruct-abliterated)
+All quants are made using the imatrix dataset.
+| Model                                            |   Size (GB)   |
+|:-------------------------------------------------|:-------------:|
+| Q2_K_S |     0.914   |
+| Q2_K |     0.931   |
+| Q3_K_M |     1.138   |
+| Q4_K_M |     1.385   |
+| Q5_K_M |     1.568   |
+| Q6_K   |     1.826   |
+| Q8_0   |     2.223   |
+| F16   |     4.177   |
+| F32   |     8.342   |
+|               | CPU (AVX2) | CPU (ARM NEON) | Metal | cuBLAS | rocBLAS | SYCL | CLBlast | Vulkan | Kompute |
+| :------------ | :---------: | :------------: | :---: | :----: | :-----: | :---: | :------: | :----: | :------: |
+| K-quants      |      ✅     |       ✅      |   ✅  |   ✅   |    ✅   |  ✅  |   ✅ 🐢5  |  ✅ 🐢5 |    ❌    |
+| I-quants      |    ✅ 🐢4   |     ✅ 🐢4    |  ✅ 🐢4 |   ✅   |    ✅   | Partial¹ |    ❌    |   ❌  |    ❌    |
 ```
+✅: feature works
+🚫: feature does not work
+❓: unknown, please contribute if you can test it youself
+🐢: feature is slow
+¹: IQ3_S and IQ1_S, see #5886
+²: Only with -ngl 0
+³: Inference is 50% slower
+⁴: Slower than K-quants of comparable size
+⁵: Slower than cuBLAS/rocBLAS on similar cards
+⁶: Only q8_0 and iq4_nl
+```