---
library_name: transformers
license: llama3.1
base_model: meta-llama/Meta-Llama-3.1-8B
tags:
- oumi
- generated_from_trainer
datasets:
- HuggingFaceH4/ultrachat_200k
model-index:
- name: Llama-3-8B-UltraChat-200K-Oumi
results: []
---
[
](https://github.com/oumi-ai/oumi)
See oumi train config
oumi version: `0.1.3`
```yaml
data:
train:
datasets:
- dataset_name: HuggingFaceH4/ultrachat_200k
dataset_path: null
subset: null
split: train_sft
dataset_kwargs: {}
sample_count: null
mixture_proportion: null
shuffle: false
seed: null
shuffle_buffer_size: 1000
trust_remote_code: true
transform_num_workers: null
collator_name: null
pack: false
stream: false
target_col: null
mixture_strategy: first_exhausted
seed: null
use_async_dataset: false
use_torchdata: null
test:
datasets: []
collator_name: null
pack: false
stream: false
target_col: null
mixture_strategy: first_exhausted
seed: null
use_async_dataset: false
use_torchdata: null
validation:
datasets: []
collator_name: null
pack: false
stream: false
target_col: null
mixture_strategy: first_exhausted
seed: null
use_async_dataset: false
use_torchdata: null
model:
model_name: meta-llama/Meta-Llama-3.1-8B
adapter_model: null
tokenizer_name: null
tokenizer_pad_token: null
tokenizer_kwargs: {}
model_max_length: 8192
load_pretrained_weights: true
trust_remote_code: true
torch_dtype_str: bfloat16
compile: false
chat_template: llama3-instruct
attn_implementation: flash_attention_2
device_map: auto
model_kwargs: {}
enable_liger_kernel: true
shard_for_eval: false
freeze_layers: []
training:
use_peft: false
trainer_type: TRL_SFT
enable_gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: false
output_dir: output/llama8b-ultrachat
per_device_train_batch_size: 1
per_device_eval_batch_size: 8
gradient_accumulation_steps: 8
max_steps: -1
num_train_epochs: 1
save_epoch: false
save_steps: 800
save_final_model: true
seed: 42
run_name: llama8b-ultrachat.sky-2025-01-30-21-19-10-053582_sky-e018-bf996_1
metrics_function: null
log_level: info
dep_log_level: warning
enable_wandb: true
enable_tensorboard: true
logging_strategy: steps
logging_dir: null
logging_steps: 100
logging_first_step: false
eval_strategy: 'no'
eval_steps: 500
learning_rate: 2.0e-05
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: null
warmup_steps: null
optimizer: paged_adamw_8bit
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1.0e-08
sgd_momentum: 0.0
mixed_precision_dtype: NONE
compile: false
include_performance_metrics: true
include_alternative_mfu_metrics: false
log_model_summary: false
resume_from_checkpoint: null
try_resume_from_last_checkpoint: false
dataloader_num_workers: 8
dataloader_prefetch_factor: 32
dataloader_main_process_only: null
ddp_find_unused_parameters: false
max_grad_norm: 1.0
trainer_kwargs:
max_seq_length: 8192
profiler:
save_dir: null
enable_cpu_profiling: false
enable_cuda_profiling: false
record_shapes: false
profile_memory: false
with_stack: false
with_flops: false
with_modules: false
row_limit: 50
schedule:
enable_schedule: false
wait: 0
warmup: 1
active: 3
repeat: 1
skip_first: 1
telemetry:
telemetry_dir: telemetry
collect_telemetry_for_all_ranks: false
track_gpu_temperature: false
empty_device_cache_steps: 50
nccl_default_timeout_minutes: null
peft:
lora_r: 8
lora_alpha: 8
lora_dropout: 0.0
lora_target_modules: null
lora_modules_to_save: null
lora_bias: none
lora_init_weights: DEFAULT
lora_task_type: CAUSAL_LM
q_lora: false
q_lora_bits: 4
bnb_4bit_quant_type: fp4
use_bnb_nested_quant: false
bnb_4bit_quant_storage: uint8
bnb_4bit_compute_dtype: float32
peft_save_mode: ADAPTER_ONLY
fsdp:
enable_fsdp: false
sharding_strategy: FULL_SHARD
cpu_offload: false
mixed_precision: null
backward_prefetch: BACKWARD_PRE
forward_prefetch: false
use_orig_params: null
state_dict_type: FULL_STATE_DICT
auto_wrap_policy: NO_WRAP
min_num_params: 100000
transformer_layer_cls: null
sync_module_states: true
```
See oumi cloud config
```yaml
name: llama8b-ultrachat-sft
num_nodes: 1
resources:
cloud: gcp
accelerators: "A100-80GB:4"
use_spot: false
disk_size: 2000 # Disk size in GBs
working_dir: .
file_mounts:
~/.netrc: ~/.netrc # WandB credentials
# Mount HF token, which is needed to download locked-down models from HF Hub.
# This is created on the local machine by running `huggingface-cli login`.
~/.cache/huggingface/token: ~/.cache/huggingface/token
envs:
WANDB_PROJECT: oumi-train
OUMI_RUN_NAME: llama8b-ultrachat
OUMI_USER_NAME: penfever
ACCELERATE_LOG_LEVEL: info
# https://github.com/huggingface/tokenizers/issues/899#issuecomment-1027739758
TOKENIZERS_PARALLELISM: false
setup: |
set -e
pip install uv && uv pip install -e .[gpu,evaluation] hf_transfer
# Install model from HF Hub. This tool increases download speed compared to
# downloading the model during training.
HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download meta-llama/Meta-Llama-3.1-8B --exclude original/*
pip install -U flash-attn --no-build-isolation
run: |
set -e # Exit if any command failed.
source ./configs/examples/misc/sky_init.sh
set -x
oumi distributed torchrun \
-m oumi train \
-c configs/recipes/llama3_1/sft/8b_full/base_ultrachat.yaml \
--training.run_name "${OUMI_RUN_NAME}.${SKYPILOT_TASK_ID}" \
echo "Node ${SKYPILOT_NODE_RANK} is all done!"
```
# Llama-3-8B-UltraChat-200K-Oumi
This model is a fine-tuned version of [meta-llama/Meta-Llama-3.1-8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B) on the HuggingFaceH4/ultrachat_200k dataset. It achieves a training loss of 1.0435.
## Model description
This model was trained as a partial reproduction of results from the recent [`WildChat-50M` paper](https://arxiv.org/abs/2501.18511).
```bibtex
@misc{feuer2025wildchat50mdeepdiverole,
title={WILDCHAT-50M: A Deep Dive Into the Role of Synthetic Data in Post-Training},
author={Benjamin Feuer and Chinmay Hegde},
year={2025},
eprint={2501.18511},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2501.18511},
}
```
## Intended uses & limitations
This model is intended for research use; it has not received any safety oriented post-training.
## Artifacts
The following is a list of artifacts which may be present in this repository, as well as brief descriptions of what they contain.
### Logs
Contains logs from the training process, one for each rank.
### Telemetry
`devices_info.txt`: A file containing information about the devices used to train the model.
`telemetry_callback_metrics.json`: File containing metrics from the training process such as loss and number of tokens seen.
`telemetry_callback_wandb.json`: File containing weights and biases parameters.
`telemetry_callback.json`: File containing metadata such as time to train and number of epochs trained.
`training_config.yaml`: File containing the training configuration used to train the model (also found in this README)
`world_size.json`: File containing the world size used to train the model.
## Datasets
Summary statistics about the datasets used to train this model.
### HuggingFaceH4/ultrachat_200k
`Split`: train_sft
`Version`: 0.0.0
`Dataset size`: 3047427114 bytes
`Download size`: 1624049723 bytes
`Size`: 4671476837 bytes
`Rows`: 207865
`Columns`: ['prompt', 'prompt_id', 'messages']
## Results
### Training Loss
| Training Loss | Epoch | Tokens Seen |
|:-------------:|:------:|:----:|
| 1.043 | 0.999 | 246 Mn |
### Evaluation
Following the paper, our benchmark results are reported using [Evalchemy](https://github.com/mlfoundations/evalchemy/). For more details on the evaluation metrics, please refer to the [paper](https://arxiv.org/abs/2501.18511). We compare to [this baseline model](https://huggingface.co/tanliboy/zephyr-llama-3-8b-sft) used in the paper.
| Metric | Oumi Repro | Baseline |
|---------|--------|----------|
| MTBench | 5.2313 | 5.0187 |
| Alpaca Eval (LC) | 1.6157 | 4.1260 |
| BBH | 0.4861 | 0.4845 |
| GPQA | 0.2903 | 0.3204 |
| MATH | 0.0552 | 0.0458 |
| MUSR | 0.4116 | 0.3917 |
| IFEval (Prompt Level, Strict) | 0.1978 | 0.2643 |
| MMLU Pro | 0.3118 | 0.3198 |
| MixEval | 0.5935 | 0.63 |
| Average | 0.321 | 0.333 |