ldp72 commited on
Commit
06cc06a
·
verified ·
1 Parent(s): 66762d0

docs: add README.md

Browse files
Files changed (1) hide show
  1. README.md +305 -14
README.md CHANGED
@@ -1,13 +1,23 @@
1
  ---
 
 
 
 
 
 
 
2
  library_name: transformers
 
 
3
  tags: []
 
4
  ---
5
 
6
- # Model Card for Model ID
7
 
8
  <!-- Provide a quick summary of what the model is/does. -->
9
 
10
-
11
 
12
  ## Model Details
13
 
@@ -15,15 +25,16 @@ tags: []
15
 
16
  <!-- Provide a longer summary of what this model is. -->
17
 
18
- This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
19
 
20
- - **Developed by:** [More Information Needed]
 
21
  - **Funded by [optional]:** [More Information Needed]
22
  - **Shared by [optional]:** [More Information Needed]
23
  - **Model type:** [More Information Needed]
24
- - **Language(s) (NLP):** [More Information Needed]
25
  - **License:** [More Information Needed]
26
- - **Finetuned from model [optional]:** [More Information Needed]
 
27
 
28
  ### Model Sources [optional]
29
 
@@ -41,7 +52,29 @@ This is the model card of a 🤗 transformers model that has been pushed on the
41
 
42
  <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
 
44
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
45
 
46
  ### Downstream Use [optional]
47
 
@@ -75,11 +108,229 @@ Use the code below to get started with the model.
75
 
76
  ## Training Details
77
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
78
  ### Training Data
79
 
80
  <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
 
82
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
83
 
84
  ### Training Procedure
85
 
@@ -89,10 +340,50 @@ Use the code below to get started with the model.
89
 
90
  [More Information Needed]
91
 
92
-
93
  #### Training Hyperparameters
94
 
95
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
96
 
97
  #### Speeds, Sizes, Times [optional]
98
 
@@ -144,11 +435,11 @@ Use the code below to get started with the model.
144
 
145
  Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
 
147
- - **Hardware Type:** [More Information Needed]
148
- - **Hours used:** [More Information Needed]
149
  - **Cloud Provider:** [More Information Needed]
150
  - **Compute Region:** [More Information Needed]
151
- - **Carbon Emitted:** [More Information Needed]
152
 
153
  ## Technical Specifications [optional]
154
 
@@ -196,4 +487,4 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
196
 
197
  ## Model Card Contact
198
 
199
- [More Information Needed]
 
1
  ---
2
+ # For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
3
+ # Doc / guide: https://huggingface.co/docs/hub/model-cards
4
+ base_model:
5
+ - HuggingFaceTB/SmolLM-135M-Instruct
6
+ datasets: []
7
+ languages:
8
+ - en
9
  library_name: transformers
10
+ metrics: []
11
+ pipeline_tag: text-generation
12
  tags: []
13
+
14
  ---
15
 
16
+ # Model Card for ldp72/Test-SmolLM-Marcel-codecarbon2
17
 
18
  <!-- Provide a quick summary of what the model is/does. -->
19
 
20
+ This model was finetuned by performing instruct tuning on Telco domain datatsets.
21
 
22
  ## Model Details
23
 
 
25
 
26
  <!-- Provide a longer summary of what this model is. -->
27
 
 
28
 
29
+
30
+ - **Developed by:** Orange
31
  - **Funded by [optional]:** [More Information Needed]
32
  - **Shared by [optional]:** [More Information Needed]
33
  - **Model type:** [More Information Needed]
34
+ - **Language(s) (NLP):** English
35
  - **License:** [More Information Needed]
36
+ - **Finetuned from model [optional]:** HuggingFaceTB/SmolLM-135M-Instruct
37
+ - **Date [optional]:** 2025-08-28 16:18:47
38
 
39
  ### Model Sources [optional]
40
 
 
52
 
53
  <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
54
 
55
+ This model can be used with the `transformers` library using `pipeline` abstraction as follows:
56
+
57
+ ```python
58
+ import torch
59
+ from transformers import pipeline
60
+
61
+ model_id = "ldp72/Test-SmolLM-Marcel-codecarbon2"
62
+ pipe = pipeline(
63
+ "text-generation",
64
+ model=model_id,
65
+ torch_dtype=torch.bfloat16,
66
+ device_map="auto",
67
+ )
68
+ messages = [
69
+ {"role": "system", "content": "You are chatbot specialized on Telco domain."},
70
+ {"role": "user", "content": "Can you give a sample of your specialized knowledge?"},
71
+ ]
72
+ outputs = pipe(
73
+ messages,
74
+ max_new_tokens=256,
75
+ )
76
+ print(outputs[0]["generated_text"][-1])
77
+ ```
78
 
79
  ### Downstream Use [optional]
80
 
 
108
 
109
  ## Training Details
110
 
111
+ This model was finetuned with [Orange internal fine tuning tools](https://gitlab.tech.orange/NEPAL/knowledge/orangelm/lm-adaptation/) with the Docker Image tagged `0.1.2` in the [registry](https://gitlab.tech.orange/NEPAL/knowledge/orangelm/lm-adaptation/container_registry/84664) and the following configuration file:
112
+
113
+ ```yaml
114
+ data:
115
+ dataset_name:
116
+ train:
117
+ - path: telco-lm/arxiv-abstract-generation-telco-instructions
118
+ revision: legacy
119
+ - path: telco-lm/synthetic-dsp.stackexchange.com-multi-task-telco-instructions
120
+ revision: legacy
121
+ - path: telco-lm/synthetic-networkengineering.stackexchange.com-multi-task-telco-instructions
122
+ revision: legacy
123
+ - path: telco-lm/synthetic-security.stackexchange.com-multi-task-telco-instructions
124
+ revision: legacy
125
+ - path: telco-lm/synthetic-technical-3gpp-multi-task-telco-instructions
126
+ revision: legacy
127
+ - path: telco-lm/synthetic-technical-5gamericas-multi-task-telco-instructions
128
+ revision: legacy
129
+ - path: telco-lm/synthetic-technical-huawei-multi-task-telco-instructions
130
+ revision: legacy
131
+ - path: telco-lm/synthetic-technical-itu-multi-task-telco-instructions
132
+ revision: legacy
133
+ - path: telco-lm/synthetic-technical-mef-multi-task-telco-instructions
134
+ revision: legacy
135
+ - path: telco-lm/synthetic-technical-ngmn-multi-task-telco-instructions
136
+ revision: legacy
137
+ - path: telco-lm/synthetic-technical-rfc-multi-task-telco-instructions
138
+ revision: legacy
139
+ - path: telco-lm/teleqna-mcqa-cot-telco-instructions
140
+ revision: legacy
141
+ - path: telco-lm/tii-huawei-qa-open-qa-telco-instructions
142
+ revision: legacy
143
+ validation_abstract_generation:
144
+ - path: telco-lm/arxiv-abstract-generation-telco-instructions
145
+ revision: legacy
146
+ split: validation
147
+ validation_general:
148
+ - path: telco-lm/slim-orca-multi-task-general-instructions
149
+ revision: legacy
150
+ split: validation
151
+ validation_synthetic:
152
+ - path: telco-lm/synthetic-dsp.stackexchange.com-multi-task-telco-instructions
153
+ revision: legacy
154
+ split: validation
155
+ - path: telco-lm/synthetic-security.stackexchange.com-multi-task-telco-instructions
156
+ revision: legacy
157
+ split: validation
158
+ - path: telco-lm/synthetic-networkengineering.stackexchange.com-multi-task-telco-instructions
159
+ revision: legacy
160
+ split: validation
161
+ - path: telco-lm/synthetic-technical-rfc-multi-task-telco-instructions
162
+ revision: legacy
163
+ split: validation
164
+ - path: telco-lm/synthetic-technical-3gpp-multi-task-telco-instructions
165
+ revision: legacy
166
+ split: validation
167
+ - path: telco-lm/synthetic-technical-5gamericas-multi-task-telco-instructions
168
+ revision: legacy
169
+ split: validation
170
+ - path: telco-lm/synthetic-technical-itu-multi-task-telco-instructions
171
+ revision: legacy
172
+ split: validation
173
+ - path: telco-lm/synthetic-technical-mef-multi-task-telco-instructions
174
+ revision: legacy
175
+ split: validation
176
+ - path: telco-lm/synthetic-technical-huawei-multi-task-telco-instructions
177
+ revision: legacy
178
+ split: validation
179
+ - path: telco-lm/synthetic-technical-ngmn-multi-task-telco-instructions
180
+ revision: legacy
181
+ split: validation
182
+ validation_telco_qa:
183
+ - path: telco-lm/tii-huawei-qa-open-qa-telco-instructions
184
+ revision: legacy
185
+ split: validation
186
+ validation_telco_qcm:
187
+ - path: telco-lm/teleqna-mcqa-cot-telco-instructions
188
+ revision: legacy
189
+ split: validation
190
+ debug: true
191
+ implementation_name: instructions
192
+ description:
193
+ contributors:
194
+ - email: [email protected]
195
+ first_name: Loïc
196
+ last_name: Fosse
197
+ - email: [email protected]
198
+ first_name: Lionel
199
+ last_name: Delphin-Poulat
200
+ - email: [email protected]
201
+ first_name: Ismaël
202
+ last_name: Rousseau
203
+ domain: Telco
204
+ languages:
205
+ - en
206
+ model_name: ldp72/Test-SmolLM-Marcel-codecarbon2
207
+ image:
208
+ version: 0.1.2
209
+ model:
210
+ attn_implementation: flash_attention_2
211
+ chat_template_tokenizer: HuggingFaceTB/SmolLM-135M-Instruct
212
+ model_name_or_path: HuggingFaceTB/SmolLM-135M-Instruct
213
+ trust_remote_code: true
214
+ training:
215
+ bf16: true
216
+ dataloader_num_workers: 4
217
+ dataloader_persistent_workers: true
218
+ dataloader_pin_memory: true
219
+ dataloader_prefetch_factor: 2
220
+ deepspeed: /config/zero3.json
221
+ disable_tqdm: true
222
+ eval_accumulation_steps: 1
223
+ eval_steps: 10
224
+ eval_strategy: steps
225
+ fp16: false
226
+ gradient_accumulation_steps: 2
227
+ gradient_checkpointing: true
228
+ group_by_length: false
229
+ learning_rate: 2.0e-05
230
+ log_level: debug
231
+ logging_dir: /outputs/Telco-SmolLM-135-Instruct-it-test-codecarbon-process-push/logs
232
+ logging_steps: 10
233
+ lr_scheduler_type: cosine
234
+ max_grad_norm: 1.0
235
+ max_steps: -1
236
+ num_train_epochs: 2
237
+ optim: paged_adamw_32bit
238
+ output_dir: /outputs/Telco-SmolLM-135-Instruct-it-test-codecarbon-process-push
239
+ per_device_eval_batch_size: 2
240
+ per_device_train_batch_size: 2
241
+ push_to_hub: false
242
+ report_to: tensorboard
243
+ save_steps: 0
244
+ save_strategy: epoch
245
+ save_total_limit: 1
246
+ seed: 42
247
+ torch_compile: false
248
+ training_type: instruct-tuning
249
+ use_liger_kernel: false
250
+ warmup_ratio: 0.05
251
+ weight_decay: 0.1
252
+ ```
253
+
254
+ The model was trained on 1 gpus with at least 40GB on each gpu.
255
+
256
+ The model was trained using [deepspeed](https://www.deepspeed.ai/) with the following configuration file:
257
+
258
+ ```json
259
+ {
260
+ "fp16": {
261
+ "enabled": "auto",
262
+ "loss_scale": 0,
263
+ "loss_scale_window": 1000,
264
+ "initial_scale_power": 16,
265
+ "hysteresis": 2,
266
+ "min_loss_scale": 1
267
+ },
268
+ "bf16": {
269
+ "enabled": "auto"
270
+ },
271
+ "zero_optimization": {
272
+ "stage": 3,
273
+ "offload_optimizer": {
274
+ "device": "cpu",
275
+ "pin_memory": true
276
+ },
277
+ "offload_param": {
278
+ "device": "cpu",
279
+ "pin_memory": true
280
+ },
281
+ "overlap_comm": true,
282
+ "contiguous_gradients": true,
283
+ "sub_group_size": "1e9",
284
+ "reduce_bucket_size": "auto",
285
+ "stage3_prefetch_bucket_size": "auto",
286
+ "stage3_param_persistence_threshold": "auto",
287
+ "stage3_max_live_parameters": "1e9",
288
+ "stage3_max_reuse_distance": "1e9",
289
+ "stage3_gather_16bit_weights_on_model_save": true
290
+ },
291
+ "gradient_accumulation_steps": "auto",
292
+ "gradient_clipping": "auto",
293
+ "steps_per_print": 2000,
294
+ "train_batch_size": "auto",
295
+ "train_micro_batch_size_per_gpu": "auto",
296
+ "wall_clock_breakdown": false
297
+ }
298
+ ```
299
+
300
  ### Training Data
301
 
302
  <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
303
 
304
+ This model was trained on the following datasets:
305
+
306
+ ```yaml
307
+ - path: telco-lm/arxiv-abstract-generation-telco-instructions
308
+ revision: legacy
309
+ - path: telco-lm/synthetic-dsp.stackexchange.com-multi-task-telco-instructions
310
+ revision: legacy
311
+ - path: telco-lm/synthetic-networkengineering.stackexchange.com-multi-task-telco-instructions
312
+ revision: legacy
313
+ - path: telco-lm/synthetic-security.stackexchange.com-multi-task-telco-instructions
314
+ revision: legacy
315
+ - path: telco-lm/synthetic-technical-3gpp-multi-task-telco-instructions
316
+ revision: legacy
317
+ - path: telco-lm/synthetic-technical-5gamericas-multi-task-telco-instructions
318
+ revision: legacy
319
+ - path: telco-lm/synthetic-technical-huawei-multi-task-telco-instructions
320
+ revision: legacy
321
+ - path: telco-lm/synthetic-technical-itu-multi-task-telco-instructions
322
+ revision: legacy
323
+ - path: telco-lm/synthetic-technical-mef-multi-task-telco-instructions
324
+ revision: legacy
325
+ - path: telco-lm/synthetic-technical-ngmn-multi-task-telco-instructions
326
+ revision: legacy
327
+ - path: telco-lm/synthetic-technical-rfc-multi-task-telco-instructions
328
+ revision: legacy
329
+ - path: telco-lm/teleqna-mcqa-cot-telco-instructions
330
+ revision: legacy
331
+ - path: telco-lm/tii-huawei-qa-open-qa-telco-instructions
332
+ revision: legacy
333
+ ```
334
 
335
  ### Training Procedure
336
 
 
340
 
341
  [More Information Needed]
342
 
 
343
  #### Training Hyperparameters
344
 
345
+ <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
346
+
347
+ - **Training regime:** This model was trained with the following hyperparameters for `SFTTrainer`,other parameters were set as default:
348
+
349
+ ```yaml
350
+ bf16: true
351
+ dataloader_num_workers: 4
352
+ dataloader_persistent_workers: true
353
+ dataloader_pin_memory: true
354
+ dataloader_prefetch_factor: 2
355
+ deepspeed: /config/zero3.json
356
+ disable_tqdm: true
357
+ eval_accumulation_steps: 1
358
+ eval_steps: 10
359
+ eval_strategy: steps
360
+ fp16: false
361
+ gradient_accumulation_steps: 2
362
+ gradient_checkpointing: true
363
+ group_by_length: false
364
+ learning_rate: 2.0e-05
365
+ log_level: debug
366
+ logging_dir: /outputs/Telco-SmolLM-135-Instruct-it-test-codecarbon-process-push/logs
367
+ logging_steps: 10
368
+ lr_scheduler_type: cosine
369
+ max_grad_norm: 1.0
370
+ max_steps: -1
371
+ num_train_epochs: 2
372
+ optim: paged_adamw_32bit
373
+ output_dir: /outputs/Telco-SmolLM-135-Instruct-it-test-codecarbon-process-push
374
+ per_device_eval_batch_size: 2
375
+ per_device_train_batch_size: 2
376
+ push_to_hub: false
377
+ report_to: tensorboard
378
+ save_steps: 0
379
+ save_strategy: epoch
380
+ save_total_limit: 1
381
+ seed: 42
382
+ torch_compile: false
383
+ use_liger_kernel: false
384
+ warmup_ratio: 0.05
385
+ weight_decay: 0.1
386
+ ```
387
 
388
  #### Speeds, Sizes, Times [optional]
389
 
 
435
 
436
  Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
437
 
438
+ - **Hardware Type:** CPUs: AMD EPYC 7282 16-Core Processor; GPUs: 1 x NVIDIA A100-PCIE-40GB
439
+ - **Hours used:** 0:10:44
440
  - **Cloud Provider:** [More Information Needed]
441
  - **Compute Region:** [More Information Needed]
442
+ - **Carbon Emitted:** 0.00089 kg CO2eq, detailed emissions can be found in [`emissions.csv`](./emissions.csv) (emissions were computed using [`codecarbon`](https://codecarbon.io/))
443
 
444
  ## Technical Specifications [optional]
445
 
 
487
 
488
  ## Model Card Contact
489
 
490
+ Thanks to [Loïc Fosse](mailto:[email protected]), [Lionel Delphin-Poulat](mailto:[email protected]), [Ismaël Rousseau](mailto:[email protected]) for adding this model.