ldp72
/

Test-SmolLM-Marcel-codecarbon2

@@ -1,13 +1,23 @@
 ---
 library_name: transformers
 tags: []
 ---
-# Model Card for Model ID
 <!-- Provide a quick summary of what the model is/does. -->
 ## Model Details
@@ -15,15 +25,16 @@ tags: []
 <!-- Provide a longer summary of what this model is. -->
-This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
-- **Developed by:** [More Information Needed]
 - **Funded by [optional]:** [More Information Needed]
 - **Shared by [optional]:** [More Information Needed]
 - **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
 - **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
 ### Model Sources [optional]
@@ -41,7 +52,29 @@ This is the model card of a 🤗 transformers model that has been pushed on the
 <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
 ### Downstream Use [optional]
@@ -75,11 +108,229 @@ Use the code below to get started with the model.
 ## Training Details
 ### Training Data
 <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
 ### Training Procedure
@@ -89,10 +340,50 @@ Use the code below to get started with the model.
 [More Information Needed]
 #### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
 #### Speeds, Sizes, Times [optional]
@@ -144,11 +435,11 @@ Use the code below to get started with the model.
 Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** [More Information Needed]
-- **Hours used:** [More Information Needed]
 - **Cloud Provider:** [More Information Needed]
 - **Compute Region:** [More Information Needed]
-- **Carbon Emitted:** [More Information Needed]
 ## Technical Specifications [optional]
@@ -196,4 +487,4 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
 ## Model Card Contact
-[More Information Needed]

 ---
+# For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
+# Doc / guide: https://huggingface.co/docs/hub/model-cards
+base_model:
+- HuggingFaceTB/SmolLM-135M-Instruct
+datasets: []
+languages:
+- en
 library_name: transformers
+metrics: []
+pipeline_tag: text-generation
 tags: []
 ---
+# Model Card for ldp72/Test-SmolLM-Marcel-codecarbon2
 <!-- Provide a quick summary of what the model is/does. -->
+This model was finetuned by performing instruct tuning on Telco domain datatsets.
 ## Model Details
 <!-- Provide a longer summary of what this model is. -->
+- **Developed by:** Orange
 - **Funded by [optional]:** [More Information Needed]
 - **Shared by [optional]:** [More Information Needed]
 - **Model type:** [More Information Needed]
+- **Language(s) (NLP):** English
 - **License:** [More Information Needed]
+- **Finetuned from model [optional]:** HuggingFaceTB/SmolLM-135M-Instruct
+- **Date [optional]:** 2025-08-28 16:18:47
 ### Model Sources [optional]
 <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+This model can be used with the `transformers` library using `pipeline` abstraction as follows:
+```python
+import torch
+from transformers import pipeline
+model_id = "ldp72/Test-SmolLM-Marcel-codecarbon2"
+pipe = pipeline(
+"text-generation",
+model=model_id,
+torch_dtype=torch.bfloat16,
+device_map="auto",
+)
+messages = [
+{"role": "system", "content": "You are chatbot specialized on Telco domain."},
+{"role": "user", "content": "Can you give a sample of your specialized knowledge?"},
+]
+outputs = pipe(
+messages,
+max_new_tokens=256,
+)
+print(outputs[0]["generated_text"][-1])
+```
 ### Downstream Use [optional]
 ## Training Details
+This model was finetuned with [Orange internal fine tuning tools](https://gitlab.tech.orange/NEPAL/knowledge/orangelm/lm-adaptation/)  with the Docker Image tagged `0.1.2` in the [registry](https://gitlab.tech.orange/NEPAL/knowledge/orangelm/lm-adaptation/container_registry/84664) and the following configuration file:
+```yaml
+data:
+dataset_name:
+train:
+-   path: telco-lm/arxiv-abstract-generation-telco-instructions
+revision: legacy
+-   path: telco-lm/synthetic-dsp.stackexchange.com-multi-task-telco-instructions
+revision: legacy
+-   path: telco-lm/synthetic-networkengineering.stackexchange.com-multi-task-telco-instructions
+revision: legacy
+-   path: telco-lm/synthetic-security.stackexchange.com-multi-task-telco-instructions
+revision: legacy
+-   path: telco-lm/synthetic-technical-3gpp-multi-task-telco-instructions
+revision: legacy
+-   path: telco-lm/synthetic-technical-5gamericas-multi-task-telco-instructions
+revision: legacy
+-   path: telco-lm/synthetic-technical-huawei-multi-task-telco-instructions
+revision: legacy
+-   path: telco-lm/synthetic-technical-itu-multi-task-telco-instructions
+revision: legacy
+-   path: telco-lm/synthetic-technical-mef-multi-task-telco-instructions
+revision: legacy
+-   path: telco-lm/synthetic-technical-ngmn-multi-task-telco-instructions
+revision: legacy
+-   path: telco-lm/synthetic-technical-rfc-multi-task-telco-instructions
+revision: legacy
+-   path: telco-lm/teleqna-mcqa-cot-telco-instructions
+revision: legacy
+-   path: telco-lm/tii-huawei-qa-open-qa-telco-instructions
+revision: legacy
+validation_abstract_generation:
+-   path: telco-lm/arxiv-abstract-generation-telco-instructions
+revision: legacy
+split: validation
+validation_general:
+-   path: telco-lm/slim-orca-multi-task-general-instructions
+revision: legacy
+split: validation
+validation_synthetic:
+-   path: telco-lm/synthetic-dsp.stackexchange.com-multi-task-telco-instructions
+revision: legacy
+split: validation
+-   path: telco-lm/synthetic-security.stackexchange.com-multi-task-telco-instructions
+revision: legacy
+split: validation
+-   path: telco-lm/synthetic-networkengineering.stackexchange.com-multi-task-telco-instructions
+revision: legacy
+split: validation
+-   path: telco-lm/synthetic-technical-rfc-multi-task-telco-instructions
+revision: legacy
+split: validation
+-   path: telco-lm/synthetic-technical-3gpp-multi-task-telco-instructions
+revision: legacy
+split: validation
+-   path: telco-lm/synthetic-technical-5gamericas-multi-task-telco-instructions
+revision: legacy
+split: validation
+-   path: telco-lm/synthetic-technical-itu-multi-task-telco-instructions
+revision: legacy
+split: validation
+-   path: telco-lm/synthetic-technical-mef-multi-task-telco-instructions
+revision: legacy
+split: validation
+-   path: telco-lm/synthetic-technical-huawei-multi-task-telco-instructions
+revision: legacy
+split: validation
+-   path: telco-lm/synthetic-technical-ngmn-multi-task-telco-instructions
+revision: legacy
+split: validation
+validation_telco_qa:
+-   path: telco-lm/tii-huawei-qa-open-qa-telco-instructions
+revision: legacy
+split: validation
+validation_telco_qcm:
+-   path: telco-lm/teleqna-mcqa-cot-telco-instructions
+revision: legacy
+split: validation
+debug: true
+implementation_name: instructions
+description:
+contributors:
+-   email: [email protected]
+first_name: Loïc
+last_name: Fosse
+-   email: [email protected]
+first_name: Lionel
+last_name: Delphin-Poulat
+-   email: [email protected]
+first_name: Ismaël
+last_name: Rousseau
+domain: Telco
+languages:
+- en
+model_name: ldp72/Test-SmolLM-Marcel-codecarbon2
+image:
+version: 0.1.2
+model:
+attn_implementation: flash_attention_2
+chat_template_tokenizer: HuggingFaceTB/SmolLM-135M-Instruct
+model_name_or_path: HuggingFaceTB/SmolLM-135M-Instruct
+trust_remote_code: true
+training:
+bf16: true
+dataloader_num_workers: 4
+dataloader_persistent_workers: true
+dataloader_pin_memory: true
+dataloader_prefetch_factor: 2
+deepspeed: /config/zero3.json
+disable_tqdm: true
+eval_accumulation_steps: 1
+eval_steps: 10
+eval_strategy: steps
+fp16: false
+gradient_accumulation_steps: 2
+gradient_checkpointing: true
+group_by_length: false
+learning_rate: 2.0e-05
+log_level: debug
+logging_dir: /outputs/Telco-SmolLM-135-Instruct-it-test-codecarbon-process-push/logs
+logging_steps: 10
+lr_scheduler_type: cosine
+max_grad_norm: 1.0
+max_steps: -1
+num_train_epochs: 2
+optim: paged_adamw_32bit
+output_dir: /outputs/Telco-SmolLM-135-Instruct-it-test-codecarbon-process-push
+per_device_eval_batch_size: 2
+per_device_train_batch_size: 2
+push_to_hub: false
+report_to: tensorboard
+save_steps: 0
+save_strategy: epoch
+save_total_limit: 1
+seed: 42
+torch_compile: false
+training_type: instruct-tuning
+use_liger_kernel: false
+warmup_ratio: 0.05
+weight_decay: 0.1
+```
+The model was trained on 1 gpus with at least 40GB on each gpu.
+The model was trained using [deepspeed](https://www.deepspeed.ai/) with the following configuration file:
+```json
+{
+"fp16": {
+"enabled": "auto",
+"loss_scale": 0,
+"loss_scale_window": 1000,
+"initial_scale_power": 16,
+"hysteresis": 2,
+"min_loss_scale": 1
+},
+"bf16": {
+"enabled": "auto"
+},
+"zero_optimization": {
+"stage": 3,
+"offload_optimizer": {
+"device": "cpu",
+"pin_memory": true
+},
+"offload_param": {
+"device": "cpu",
+"pin_memory": true
+},
+"overlap_comm": true,
+"contiguous_gradients": true,
+"sub_group_size": "1e9",
+"reduce_bucket_size": "auto",
+"stage3_prefetch_bucket_size": "auto",
+"stage3_param_persistence_threshold": "auto",
+"stage3_max_live_parameters": "1e9",
+"stage3_max_reuse_distance": "1e9",
+"stage3_gather_16bit_weights_on_model_save": true
+},
+"gradient_accumulation_steps": "auto",
+"gradient_clipping": "auto",
+"steps_per_print": 2000,
+"train_batch_size": "auto",
+"train_micro_batch_size_per_gpu": "auto",
+"wall_clock_breakdown": false
+}
+```
 ### Training Data
 <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+This model was trained on the following datasets:
+```yaml
+-   path: telco-lm/arxiv-abstract-generation-telco-instructions
+revision: legacy
+-   path: telco-lm/synthetic-dsp.stackexchange.com-multi-task-telco-instructions
+revision: legacy
+-   path: telco-lm/synthetic-networkengineering.stackexchange.com-multi-task-telco-instructions
+revision: legacy
+-   path: telco-lm/synthetic-security.stackexchange.com-multi-task-telco-instructions
+revision: legacy
+-   path: telco-lm/synthetic-technical-3gpp-multi-task-telco-instructions
+revision: legacy
+-   path: telco-lm/synthetic-technical-5gamericas-multi-task-telco-instructions
+revision: legacy
+-   path: telco-lm/synthetic-technical-huawei-multi-task-telco-instructions
+revision: legacy
+-   path: telco-lm/synthetic-technical-itu-multi-task-telco-instructions
+revision: legacy
+-   path: telco-lm/synthetic-technical-mef-multi-task-telco-instructions
+revision: legacy
+-   path: telco-lm/synthetic-technical-ngmn-multi-task-telco-instructions
+revision: legacy
+-   path: telco-lm/synthetic-technical-rfc-multi-task-telco-instructions
+revision: legacy
+-   path: telco-lm/teleqna-mcqa-cot-telco-instructions
+revision: legacy
+-   path: telco-lm/tii-huawei-qa-open-qa-telco-instructions
+revision: legacy
+```
 ### Training Procedure
 [More Information Needed]
 #### Training Hyperparameters
+<!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+- **Training regime:** This model was trained with the following hyperparameters for `SFTTrainer`,other parameters were set as default:
+```yaml
+bf16: true
+dataloader_num_workers: 4
+dataloader_persistent_workers: true
+dataloader_pin_memory: true
+dataloader_prefetch_factor: 2
+deepspeed: /config/zero3.json
+disable_tqdm: true
+eval_accumulation_steps: 1
+eval_steps: 10
+eval_strategy: steps
+fp16: false
+gradient_accumulation_steps: 2
+gradient_checkpointing: true
+group_by_length: false
+learning_rate: 2.0e-05
+log_level: debug
+logging_dir: /outputs/Telco-SmolLM-135-Instruct-it-test-codecarbon-process-push/logs
+logging_steps: 10
+lr_scheduler_type: cosine
+max_grad_norm: 1.0
+max_steps: -1
+num_train_epochs: 2
+optim: paged_adamw_32bit
+output_dir: /outputs/Telco-SmolLM-135-Instruct-it-test-codecarbon-process-push
+per_device_eval_batch_size: 2
+per_device_train_batch_size: 2
+push_to_hub: false
+report_to: tensorboard
+save_steps: 0
+save_strategy: epoch
+save_total_limit: 1
+seed: 42
+torch_compile: false
+use_liger_kernel: false
+warmup_ratio: 0.05
+weight_decay: 0.1
+```
 #### Speeds, Sizes, Times [optional]
 Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** CPUs: AMD EPYC 7282 16-Core Processor; GPUs: 1 x NVIDIA A100-PCIE-40GB
+- **Hours used:** 0:10:44
 - **Cloud Provider:** [More Information Needed]
 - **Compute Region:** [More Information Needed]
+- **Carbon Emitted:**  0.00089 kg CO2eq, detailed emissions can be found in [`emissions.csv`](./emissions.csv) (emissions were computed using [`codecarbon`](https://codecarbon.io/))
 ## Technical Specifications [optional]
 ## Model Card Contact
+Thanks to [Loïc Fosse](mailto:[email protected]), [Lionel Delphin-Poulat](mailto:[email protected]), [Ismaël Rousseau](mailto:[email protected]) for adding this model.