qwp4w3hyb
/

Phi-3-mini-128k-instruct-iMat-GGUF

Text Generation

importance matrix

Inference Endpoints

Model card Files Files and versions Community

qwp4w3hyb commited on May 22, 2024

Commit

cf9616d

·

verified ·

1 Parent(s): d1925e5

Update README.md

Files changed (1) hide show

README.md +16 -19

README.md CHANGED Viewed

@@ -1,16 +1,18 @@
 ---
-base_model: microsoft/Phi-3-mini-128k-instruct
 license: mit
-license_link: LICENSE
 language:
-- en
 pipeline_tag: text-generation
 tags:
 - nlp
 - code
 - microsoft
 - phi
-- phi-3
 - gguf
 - imatrix
 - importance matrix
@@ -18,22 +20,17 @@ tags:
 # Quant Infos
-- The 128k context is not fully supported by llama.cpp yet, but in my testing this model works fine up to 50k+ already
 - quants done with an importance matrix for improved quantization loss
-- quantized & generated imatrix from the f32 as f16 is inaccurate when converting from bf16
-- K & IQ quants in basically all variants from Q6_K down to IQ1_S
-Quantized with [llama.cpp](https://github.com/ggerganov/llama.cpp) commit [b4e4b8a9351d918a56831c73cf9f25c1837b80d1](https://github.com/ggerganov/llama.cpp/commit/b4e4b8a9351d918a56831c73cf9f25c1837b80d1) (master from 2024-04-24)
-Imatrix dataset was used from [here](https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384)
-Using this command to generate the importance matrix from the f32.gguf
-```
-./imatrix -c 512 -m $model_name-f16.gguf -f $llama_cpp_path/groups_merged.txt -o $out_path/imat-f16-gmerged.dat
-```
-# Original Model Card
 ## Model Summary

 ---
 license: mit
+license_link: >-
+  https://huggingface.co/microsoft/Phi-3-mini-128k-instruct/resolve/main/LICENSE
 language:
+  - multilingual
 pipeline_tag: text-generation
+base_model: microsoft/Phi-3-medium-128k-instruct
 tags:
 - nlp
 - code
 - microsoft
 - phi
+- instruct
+- finetune
 - gguf
 - imatrix
 - importance matrix
 # Quant Infos
+- Requires latest llama.cpp master;
 - quants done with an importance matrix for improved quantization loss
+- gguf & imatrix generated from bf16 for "optimal" accuracy loss (some say this is snake oil, but it can't hurt)
+- Wide coverage of different gguf quant types from Q\_8\_0 down to IQ1\_S (in progress)
+- Quantized with [llama.cpp](https://github.com/ggerganov/llama.cpp) commit [201cc11afa0a1950e1f632390b2ac6c937a0d8f0](https://github.com/ggerganov/llama.cpp/commit/201cc11afa0a1950e1f632390b2ac6c937a0d8f0)
+- Imatrix generated with [this](https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384) multi-purpose dataset.
+  ```
+  ./imatrix -c 512 -m $model_name-bf16.gguf -f $llama_cpp_path/groups_merged.txt -o $out_path/imat-bf16-gmerged.dat
+  ```
+# Original Model Card:
 ## Model Summary