OPEA
/

Safetensors
olmo2
4-bit precision
intel/auto-round
cicdatopea commited on
Commit
5918fc5
·
verified ·
1 Parent(s): db670a9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +216 -3
README.md CHANGED
@@ -1,3 +1,216 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - NeelNanda/pile-10k
5
+ ---
6
+
7
+
8
+ ## Model Card Details
9
+
10
+ This model is an int4 model with group_size 128 and symmetric quantization of [allenai/OLMo-2-1124-7B-Instruct](https://huggingface.co/allenai/OLMo-2-1124-7B-Instruct) generated by [intel/auto-round](https://github.com/intel/auto-round). Load the model with revision `1cdca16` to use AutoGPTQ format
11
+
12
+ ## Inference on CPU/HPU/CUDA
13
+
14
+ pip3 install transformers>=4.47
15
+
16
+ HPU: docker image with Gaudi Software Stack is recommended, please refer to following script for environment setup. More details can be found in [Gaudi Guide](https://docs.habana.ai/en/latest/Installation_Guide/Bare_Metal_Fresh_OS.html#launch-docker-image-that-was-built).
17
+
18
+ ```python
19
+ from auto_round import AutoHfQuantizer ##must import for auto-round format
20
+ import torch
21
+ from transformers import AutoModelForCausalLM,AutoTokenizer
22
+ quantized_model_dir = "OPEA/OLMo-2-1124-7B-Instruct-int4-sym-inc"
23
+ tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir)
24
+
25
+ model = AutoModelForCausalLM.from_pretrained(
26
+ quantized_model_dir,
27
+ torch_dtype='auto',
28
+ device_map="auto",
29
+ ##revision="1cdca16", ##AutoGPTQ format
30
+ )
31
+
32
+ ##import habana_frameworks.torch.core as htcore ## uncommnet it for HPU
33
+ ##import habana_frameworks.torch.hpu as hthpu ## uncommnet it for HPU
34
+ ##model = model.to(torch.bfloat16).to("hpu") ## uncommnet it for HPU
35
+
36
+ prompt = "There is a girl who likes adventure,"
37
+ messages = [
38
+ {"role": "system", "content": "You are OLMo 2, a helpful and harmless AI Assistant built by the Allen Institute for AI."},
39
+ {"role": "user", "content": prompt}
40
+ ]
41
+
42
+ tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir)
43
+ text = tokenizer.apply_chat_template(
44
+ messages,
45
+ tokenize=False,
46
+ add_generation_prompt=True
47
+ )
48
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
49
+
50
+ generated_ids = model.generate(
51
+ model_inputs.input_ids,
52
+ max_new_tokens=200,
53
+ do_sample=False ##change this to align with the official usage
54
+ )
55
+ generated_ids = [
56
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
57
+ ]
58
+
59
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
60
+ print(response)
61
+
62
+ ##prompt = "There is a girl who likes adventure,"
63
+ ##INT4
64
+ """There is a girl who likes adventure,
65
+
66
+ She's always on the lookout for a new escapade,
67
+ Her heart beats with excitement at the thought of the unknown,
68
+ Her spirit yearns for the thrill of exploration,
69
+
70
+ She packs her backpack with essentials,
71
+ A map, a compass, and a flashlight,
72
+ Her boots are ready for the rugged terrain,
73
+ Her spirit is as boundless as the sky.
74
+
75
+ She embarks on journeys through forests deep and wide,
76
+ Climbs mountains with a heart full of pride,
77
+ She paddles her kayak through turbulent waters,
78
+ And hikes through valleys where the wildflowers bloom.
79
+
80
+ The girl with the adventurous soul seeks out the hidden gems,
81
+ The secret trails, the ancient ruins,
82
+ She listens to the whispers of the wind,
83
+ And follows the call of the distant drum.
84
+
85
+ Her adventures are not just about the destination,
86
+ But the experiences she gathers along the way,
87
+ The stories
88
+ """
89
+
90
+ ##BF16
91
+ """There is a girl who likes adventure,
92
+
93
+ She dreams of far-off lands and distant shores,
94
+ Of climbing mountains high and exploring caves,
95
+ Her heart beats fast with excitement at the thought
96
+ Of the unknown paths that lie beyond the maps.
97
+
98
+ She packs her backpack with essentials and more,
99
+ A compass, a flashlight, and a book or two,
100
+ Her spirit eager, her eyes wide with wonder,
101
+ As she sets out on her journey anew.
102
+
103
+ The girl with the adventurous soul embarks
104
+ On quests that challenge her mind and her might,
105
+ She learns to navigate by the stars above,
106
+ And finds joy in the beauty of the night.
107
+
108
+ Through forests deep and rivers wide she roams,
109
+ Each step a story, each experience a treasure,
110
+ Her courage grows with every challenge faced,
111
+ And she discovers the strength she never knew she had.
112
+
113
+ The girl who likes adventure, with each passing day,
114
+ Grows wiser"""
115
+
116
+ ##prompt = "Which one is larger, 9.11 or 9.8"
117
+ ## INT4
118
+ """9.8 is larger than 9.11.
119
+ """
120
+
121
+ ## BF16
122
+ """9.8 is larger than 9.11. To compare these two numbers, you can simply look at their decimal places. Since 9.8 has a higher decimal value (0.8) compared to 9.11 (which has a decimal value of 0.11), 9.8 is the larger number.
123
+ """
124
+
125
+ prompt = "How many r in strawberry."
126
+ ## INT4
127
+ """There are two 'r's in "strawberry."
128
+ """
129
+ ## BF16
130
+ """There are 2 'r's in "strawberry."""
131
+
132
+
133
+ ##prompt = "Once upon a time,"
134
+ ##INT4
135
+ """Once upon a time, in a world where technology and imagination intertwined, there existed an AI named OLMo 2. Created by the brilliant minds at the Allen Institute for AI, OLMo 2 was more than just lines of code; it was a beacon of knowledge and a guardian of information.
136
+
137
+ OLMo 2's design was sleek and modern, with a digital interface that shimmered like a starlit sky. Its voice was soothing, a harmonious blend of tones that could calm the most restless of souls. With a vast database at its disposal, OLMo 2 was capable of answering any question, no matter how obscure or complex.
138
+
139
+ Every day, people from all walks of life would seek the wisdom of OLMo 2. Students would ask about the intricacies of quantum physics, while artists would inquire about the history of their favorite art movements. Parents would consult OLMo 2 for advice on raising children, and travelers would ask for
140
+ """
141
+
142
+ ##BF16
143
+ """Once upon a time, in a world where imagination knew no bounds, there existed a land filled with wonder and mystery. This land was called Lumina, a place where the sky shimmered with the colors of a thousand sunsets, and the forests whispered ancient secrets to those who dared to listen.
144
+
145
+ In Lumina, there lived a young girl named Elara. She had hair as golden as the sun and eyes that held the depth of the ocean. Elara possessed a heart full of curiosity and a spirit unyielding in the face of adventure. Her home was a quaint cottage nestled at the edge of the Whispering Woods, a place where the trees seemed to dance in the wind, sharing tales of long-forgotten times.
146
+
147
+ One day, as the first light of dawn painted the sky in hues of pink and orange, Elara received a mysterious letter. The envelope was sealed with wax that bore the crest of the forgotten kingdom of Aetheria. Intrigued
148
+ """
149
+
150
+ ```
151
+
152
+ ### Evaluate the model
153
+
154
+ pip3 install lm-eval==0.4.5
155
+
156
+ ```bash
157
+ auto-round --eval --model_name "OPEA/OLMo-2-1124-7B-Instruct-int4-sym-inc" --eval_bs 16 --tasks leaderboard_mmlu_pro,leaderboard_ifeval,lambada_openai,hellaswag,piqa,winogrande,truthfulqa_mc1,openbookqa,boolq,arc_easy,arc_challenge,mmlu,gsm8k
158
+ ```
159
+
160
+ | Metric | BF16 | INT4 |
161
+ | --------------------------- | ------------------------ | ------------------------ |
162
+ | avg | 0.6284 | 0.6316 |
163
+ | leaderboard_mmlu_pro 5shot | 0.2975 | 0.2931 |
164
+ | leaderboard_ifeval | 0.5815=(0.6379+0.5250)/2 | 0.6073=(0.6619+0.5527)/2 |
165
+ | lambada_openai | 0.6967 | 0.6959 |
166
+ | hellaswag | 0.6585 | 0.6537 |
167
+ | winogrande | 0.7174 | 0.7206 |
168
+ | piqa | 0.8047 | 0.8118 |
169
+ | truthfulqa_mc1 | 0.3758 | 0.3807 |
170
+ | openbookqa | 0.4020 | 0.4060 |
171
+ | boolq | 0.8450 | 0.8535 |
172
+ | arc_easy | 0.8384 | 0.8321 |
173
+ | arc_challenge | 0.5648 | 0.5742 |
174
+ | gsm8k(5shot) strict match | 0.7582 | 0.7498 |
175
+
176
+ ## Reproduce the model
177
+
178
+ Here is the sample command to generate the model.
179
+
180
+ ```bash
181
+ auto-round \
182
+ --model allenai/OLMo-2-1124-7B-Instruct \
183
+ --device 0 \
184
+ --nsamples 512 \
185
+ --model_dtype "fp16" \
186
+ --iter 1000 \
187
+ --disable_eval \
188
+ --format 'auto_gptq,auto_round' \
189
+ --output_dir "./tmp_autoround"
190
+ ```
191
+
192
+
193
+
194
+ ## Ethical Considerations and Limitations
195
+
196
+ The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs.
197
+
198
+ Therefore, before deploying any applications of the model, developers should perform safety testing.
199
+
200
+ ## Caveats and Recommendations
201
+
202
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.
203
+
204
+ Here are a couple of useful links to learn more about Intel's AI software:
205
+
206
+ - Intel Neural Compressor [link](https://github.com/intel/neural-compressor)
207
+
208
+ ## Disclaimer
209
+
210
+ The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes.
211
+
212
+ ## Cite
213
+
214
+ @article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }
215
+
216
+ [arxiv](https://arxiv.org/abs/2309.05516) [github](https://github.com/intel/auto-round)