qwp4w3hyb commited on
Commit
cf9616d
·
verified ·
1 Parent(s): d1925e5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -19
README.md CHANGED
@@ -1,16 +1,18 @@
1
  ---
2
- base_model: microsoft/Phi-3-mini-128k-instruct
3
  license: mit
4
- license_link: LICENSE
 
5
  language:
6
- - en
7
  pipeline_tag: text-generation
 
8
  tags:
9
  - nlp
10
  - code
11
  - microsoft
12
  - phi
13
- - phi-3
 
14
  - gguf
15
  - imatrix
16
  - importance matrix
@@ -18,22 +20,17 @@ tags:
18
 
19
  # Quant Infos
20
 
21
- - The 128k context is not fully supported by llama.cpp yet, but in my testing this model works fine up to 50k+ already
22
  - quants done with an importance matrix for improved quantization loss
23
- - quantized & generated imatrix from the f32 as f16 is inaccurate when converting from bf16
24
- - K & IQ quants in basically all variants from Q6_K down to IQ1_S
25
-
26
- Quantized with [llama.cpp](https://github.com/ggerganov/llama.cpp) commit [b4e4b8a9351d918a56831c73cf9f25c1837b80d1](https://github.com/ggerganov/llama.cpp/commit/b4e4b8a9351d918a56831c73cf9f25c1837b80d1) (master from 2024-04-24)
27
-
28
- Imatrix dataset was used from [here](https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384)
29
-
30
- Using this command to generate the importance matrix from the f32.gguf
31
-
32
- ```
33
- ./imatrix -c 512 -m $model_name-f16.gguf -f $llama_cpp_path/groups_merged.txt -o $out_path/imat-f16-gmerged.dat
34
- ```
35
-
36
- # Original Model Card
37
 
38
  ## Model Summary
39
 
 
1
  ---
 
2
  license: mit
3
+ license_link: >-
4
+ https://huggingface.co/microsoft/Phi-3-mini-128k-instruct/resolve/main/LICENSE
5
  language:
6
+ - multilingual
7
  pipeline_tag: text-generation
8
+ base_model: microsoft/Phi-3-medium-128k-instruct
9
  tags:
10
  - nlp
11
  - code
12
  - microsoft
13
  - phi
14
+ - instruct
15
+ - finetune
16
  - gguf
17
  - imatrix
18
  - importance matrix
 
20
 
21
  # Quant Infos
22
 
23
+ - Requires latest llama.cpp master;
24
  - quants done with an importance matrix for improved quantization loss
25
+ - gguf & imatrix generated from bf16 for "optimal" accuracy loss (some say this is snake oil, but it can't hurt)
26
+ - Wide coverage of different gguf quant types from Q\_8\_0 down to IQ1\_S (in progress)
27
+ - Quantized with [llama.cpp](https://github.com/ggerganov/llama.cpp) commit [201cc11afa0a1950e1f632390b2ac6c937a0d8f0](https://github.com/ggerganov/llama.cpp/commit/201cc11afa0a1950e1f632390b2ac6c937a0d8f0)
28
+ - Imatrix generated with [this](https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384) multi-purpose dataset.
29
+ ```
30
+ ./imatrix -c 512 -m $model_name-bf16.gguf -f $llama_cpp_path/groups_merged.txt -o $out_path/imat-bf16-gmerged.dat
31
+ ```
32
+
33
+ # Original Model Card:
 
 
 
 
 
34
 
35
  ## Model Summary
36