File size: 8,943 Bytes
cff48f4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
---
library_name: transformers
license: llama3.1
base_model: meta-llama/Meta-Llama-3.1-8B
tags:
- oumi
- generated_from_trainer
datasets:
- HuggingFaceH4/ultrachat_200k
model-index:
- name: Llama-3-8B-UltraChat-200K-Oumi
  results: []
---
[<img src="https://github.com/oumi-ai/oumi/blob/main/docs/_static/logo/header_logo.png?raw=true" alt="Built with Oumi" width="200" height="60"/>](https://github.com/oumi-ai/oumi)
<details><summary>See oumi train config</summary>

oumi version: `0.1.3`
```yaml
data:
  train:
    datasets:
    - dataset_name: HuggingFaceH4/ultrachat_200k
      dataset_path: null
      subset: null
      split: train_sft
      dataset_kwargs: {}
      sample_count: null
      mixture_proportion: null
      shuffle: false
      seed: null
      shuffle_buffer_size: 1000
      trust_remote_code: true
      transform_num_workers: null
    collator_name: null
    pack: false
    stream: false
    target_col: null
    mixture_strategy: first_exhausted
    seed: null
    use_async_dataset: false
    use_torchdata: null
  test:
    datasets: []
    collator_name: null
    pack: false
    stream: false
    target_col: null
    mixture_strategy: first_exhausted
    seed: null
    use_async_dataset: false
    use_torchdata: null
  validation:
    datasets: []
    collator_name: null
    pack: false
    stream: false
    target_col: null
    mixture_strategy: first_exhausted
    seed: null
    use_async_dataset: false
    use_torchdata: null
model:
  model_name: meta-llama/Meta-Llama-3.1-8B
  adapter_model: null
  tokenizer_name: null
  tokenizer_pad_token: null
  tokenizer_kwargs: {}
  model_max_length: 8192
  load_pretrained_weights: true
  trust_remote_code: true
  torch_dtype_str: bfloat16
  compile: false
  chat_template: llama3-instruct
  attn_implementation: flash_attention_2
  device_map: auto
  model_kwargs: {}
  enable_liger_kernel: true
  shard_for_eval: false
  freeze_layers: []
training:
  use_peft: false
  trainer_type: TRL_SFT
  enable_gradient_checkpointing: true
  gradient_checkpointing_kwargs:
    use_reentrant: false
  output_dir: output/llama8b-ultrachat
  per_device_train_batch_size: 1
  per_device_eval_batch_size: 8
  gradient_accumulation_steps: 8
  max_steps: -1
  num_train_epochs: 1
  save_epoch: false
  save_steps: 800
  save_final_model: true
  seed: 42
  run_name: llama8b-ultrachat.sky-2025-01-30-21-19-10-053582_sky-e018-bf996_1
  metrics_function: null
  log_level: info
  dep_log_level: warning
  enable_wandb: true
  enable_tensorboard: true
  logging_strategy: steps
  logging_dir: null
  logging_steps: 100
  logging_first_step: false
  eval_strategy: 'no'
  eval_steps: 500
  learning_rate: 2.0e-05
  lr_scheduler_type: linear
  lr_scheduler_kwargs: {}
  warmup_ratio: null
  warmup_steps: null
  optimizer: paged_adamw_8bit
  weight_decay: 0.0
  adam_beta1: 0.9
  adam_beta2: 0.999
  adam_epsilon: 1.0e-08
  sgd_momentum: 0.0
  mixed_precision_dtype: NONE
  compile: false
  include_performance_metrics: true
  include_alternative_mfu_metrics: false
  log_model_summary: false
  resume_from_checkpoint: null
  try_resume_from_last_checkpoint: false
  dataloader_num_workers: 8
  dataloader_prefetch_factor: 32
  dataloader_main_process_only: null
  ddp_find_unused_parameters: false
  max_grad_norm: 1.0
  trainer_kwargs:
    max_seq_length: 8192
  profiler:
    save_dir: null
    enable_cpu_profiling: false
    enable_cuda_profiling: false
    record_shapes: false
    profile_memory: false
    with_stack: false
    with_flops: false
    with_modules: false
    row_limit: 50
    schedule:
      enable_schedule: false
      wait: 0
      warmup: 1
      active: 3
      repeat: 1
      skip_first: 1
  telemetry:
    telemetry_dir: telemetry
    collect_telemetry_for_all_ranks: false
    track_gpu_temperature: false
  empty_device_cache_steps: 50
  nccl_default_timeout_minutes: null
peft:
  lora_r: 8
  lora_alpha: 8
  lora_dropout: 0.0
  lora_target_modules: null
  lora_modules_to_save: null
  lora_bias: none
  lora_init_weights: DEFAULT
  lora_task_type: CAUSAL_LM
  q_lora: false
  q_lora_bits: 4
  bnb_4bit_quant_type: fp4
  use_bnb_nested_quant: false
  bnb_4bit_quant_storage: uint8
  bnb_4bit_compute_dtype: float32
  peft_save_mode: ADAPTER_ONLY
fsdp:
  enable_fsdp: false
  sharding_strategy: FULL_SHARD
  cpu_offload: false
  mixed_precision: null
  backward_prefetch: BACKWARD_PRE
  forward_prefetch: false
  use_orig_params: null
  state_dict_type: FULL_STATE_DICT
  auto_wrap_policy: NO_WRAP
  min_num_params: 100000
  transformer_layer_cls: null
  sync_module_states: true
```

</details><br>

<details><summary>See oumi cloud config</summary>

```yaml
name: llama8b-ultrachat-sft

num_nodes: 1
resources:
  cloud: gcp
  accelerators: "A100-80GB:4"
  use_spot: false
  disk_size: 2000 # Disk size in GBs

working_dir: .

file_mounts:
  ~/.netrc: ~/.netrc  # WandB credentials
  # Mount HF token, which is needed to download locked-down models from HF Hub.
  # This is created on the local machine by running `huggingface-cli login`.
  ~/.cache/huggingface/token: ~/.cache/huggingface/token

envs:
  WANDB_PROJECT: oumi-train
  OUMI_RUN_NAME: llama8b-ultrachat
  OUMI_USER_NAME: penfever
  ACCELERATE_LOG_LEVEL: info
  # https://github.com/huggingface/tokenizers/issues/899#issuecomment-1027739758
  TOKENIZERS_PARALLELISM: false
setup: |
  set -e
  pip install uv && uv pip install -e .[gpu,evaluation] hf_transfer
  # Install model from HF Hub. This tool increases download speed compared to
  # downloading the model during training.
  HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download meta-llama/Meta-Llama-3.1-8B --exclude original/*
  pip install -U flash-attn --no-build-isolation

run: |
  set -e  # Exit if any command failed.
  source ./configs/examples/misc/sky_init.sh

  set -x
  oumi distributed torchrun \
    -m oumi train \
    -c configs/recipes/llama3_1/sft/8b_full/base_ultrachat.yaml \
    --training.run_name "${OUMI_RUN_NAME}.${SKYPILOT_TASK_ID}" \

  echo "Node ${SKYPILOT_NODE_RANK} is all done!"
```

</details><br>

# Llama-3-8B-UltraChat-200K-Oumi

This model is a fine-tuned version of [meta-llama/Meta-Llama-3.1-8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B) on the HuggingFaceH4/ultrachat_200k dataset. It achieves a training loss of 1.0435.

## Model description

This model was trained as a partial reproduction of results from the recent [`WildChat-50M` paper](https://arxiv.org/abs/2501.18511).

```bibtex
@misc{feuer2025wildchat50mdeepdiverole,
      title={WILDCHAT-50M: A Deep Dive Into the Role of Synthetic Data in Post-Training}, 
      author={Benjamin Feuer and Chinmay Hegde},
      year={2025},
      eprint={2501.18511},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2501.18511}, 
}
```

## Intended uses & limitations

This model is intended for research use; it has not received any safety oriented post-training.

## Artifacts

The following is a list of artifacts which may be present in this repository, as well as brief descriptions of what they contain.

### Logs

Contains logs from the training process, one for each rank.

### Telemetry

`devices_info.txt`: A file containing information about the devices used to train the model.

`telemetry_callback_metrics.json`: File containing metrics from the training process such as loss and number of tokens seen.

`telemetry_callback_wandb.json`: File containing weights and biases parameters.

`telemetry_callback.json`: File containing metadata such as time to train and number of epochs trained.

`training_config.yaml`: File containing the training configuration used to train the model (also found in this README)

`world_size.json`: File containing the world size used to train the model.

## Datasets

Summary statistics about the datasets used to train this model.

### HuggingFaceH4/ultrachat_200k

`Split`: train_sft

`Version`: 0.0.0

`Dataset size`: 3047427114 bytes

`Download size`: 1624049723 bytes

`Size`: 4671476837 bytes

`Rows`: 207865

`Columns`: ['prompt', 'prompt_id', 'messages']

## Results

### Training Loss

| Training Loss | Epoch  | Tokens Seen |
|:-------------:|:------:|:----:|
| 1.043         | 0.999 | 246 Mn |

### Evaluation

Following the paper, our benchmark results are reported using [Evalchemy](https://github.com/mlfoundations/evalchemy/). For more details on the evaluation metrics, please refer to the [paper](https://arxiv.org/abs/2501.18511). We compare to [this baseline model](https://huggingface.co/tanliboy/zephyr-llama-3-8b-sft) used in the paper.

| Metric | Oumi Repro | Baseline |
|---------|--------|----------|
| MTBench | 5.2313 | 5.0187 |
| Alpaca Eval (LC) | 1.6157 | 4.1260 |
| BBH | 0.4861 | 0.4845 |
| GPQA | 0.2903 | 0.3204 |
| MATH | 0.0552 | 0.0458 |
| MUSR | 0.4116 | 0.3917 |
| IFEval (Prompt Level, Strict) | 0.1978 | 0.2643 |
| MMLU Pro | 0.3118 | 0.3198 |
| MixEval | 0.5935 | 0.63 |
| Average | 0.321 | 0.333 |