File size: 9,993 Bytes
512d367
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
---
license: llama2
base_model: meta-llama/CodeLlama-34b-Instruct-hf
tags:
- alignment-handbook
- generated_from_trainer
datasets:
- meng-lab/CodeLlama-34B-Instruct-xsum
model-index:
- name: CodeLlama-34b-Instruct-sft-5e-3-epoch-100-xsum
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/uva-llm/huggingface/runs/trj5frth)
# CodeLlama-34b-Instruct-sft-5e-3-epoch-100-xsum

This model is a fine-tuned version of [meta-llama/CodeLlama-34b-Instruct-hf](https://huggingface.co/meta-llama/CodeLlama-34b-Instruct-hf) on the meng-lab/CodeLlama-34B-Instruct-xsum dataset.
It achieves the following results on the evaluation set:
- Loss: 5.3547
- Loss Layer 6 Head: 1.5863
- Loss Layer 12 Head: 1.2384
- Loss Layer 18 Head: 1.0729
- Loss Layer 24 Head: 0.6857
- Loss Layer 30 Head: 0.4438
- Loss Layer 36 Head: 0.2842
- Loss Layer 42 Head: 0.1685

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.005
- train_batch_size: 1
- eval_batch_size: 2
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- total_eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 100

### Training results

| Training Loss | Epoch | Step | Validation Loss | Loss Layer 6 Head | Loss Layer 12 Head | Loss Layer 18 Head | Loss Layer 24 Head | Loss Layer 30 Head | Loss Layer 36 Head | Loss Layer 42 Head |
|:-------------:|:-----:|:----:|:---------------:|:-----------------:|:------------------:|:------------------:|:------------------:|:------------------:|:------------------:|:------------------:|
| 5.7055        | 2.56  | 200  | 6.7354          | 1.7854            | 1.4923             | 1.4206             | 0.8735             | 0.6246             | 0.4658             | 0.4023             |
| 4.2132        | 5.12  | 400  | 6.1546          | 1.7830            | 1.3957             | 1.1440             | 0.7581             | 0.6526             | 0.3538             | 0.2167             |
| 4.081         | 7.68  | 600  | 6.0643          | 1.6946            | 1.4230             | 1.1566             | 0.8413             | 0.5291             | 0.3589             | 0.2186             |
| 3.5585        | 10.24 | 800  | 5.8829          | 1.6599            | 1.3383             | 1.1385             | 0.7602             | 0.5903             | 0.3677             | 0.2221             |
| 3.5251        | 12.8  | 1000 | 5.7000          | 1.6490            | 1.2994             | 1.0979             | 0.7252             | 0.5119             | 0.3438             | 0.2164             |
| 3.1679        | 15.36 | 1200 | 5.6536          | 1.6224            | 1.2553             | 1.1685             | 0.7247             | 0.5292             | 0.3125             | 0.1873             |
| 3.2193        | 17.92 | 1400 | 5.5506          | 1.5900            | 1.2721             | 1.0925             | 0.7382             | 0.4849             | 0.3224             | 0.1969             |
| 3.0832        | 20.48 | 1600 | 5.5640          | 1.5978            | 1.2975             | 1.1012             | 0.7319             | 0.4884             | 0.3065             | 0.1891             |
| 2.9621        | 23.04 | 1800 | 5.5682          | 1.6054            | 1.2700             | 1.1180             | 0.7373             | 0.4615             | 0.2985             | 0.2074             |
| 3.0878        | 25.6  | 2000 | 5.7224          | 1.6020            | 1.4047             | 1.1298             | 0.7446             | 0.4841             | 0.3109             | 0.1890             |
| 2.8619        | 28.16 | 2200 | 5.5169          | 1.5917            | 1.2565             | 1.0982             | 0.7340             | 0.4624             | 0.3221             | 0.2038             |
| 2.9146        | 30.72 | 2400 | 5.4960          | 1.6334            | 1.2661             | 1.0884             | 0.7008             | 0.4590             | 0.3066             | 0.1775             |
| 2.8805        | 33.28 | 2600 | 5.7326          | 1.7120            | 1.2473             | 1.1268             | 0.8572             | 0.5254             | 0.3132             | 0.1889             |
| 2.8492        | 35.84 | 2800 | 5.5193          | 1.6050            | 1.2626             | 1.0868             | 0.7980             | 0.4569             | 0.2897             | 0.1967             |
| 2.7414        | 38.4  | 3000 | 5.5041          | 1.5895            | 1.2722             | 1.1454             | 0.6997             | 0.4646             | 0.2958             | 0.1719             |
| 2.8092        | 40.96 | 3200 | 5.4876          | 1.5899            | 1.2512             | 1.0805             | 0.7123             | 0.4602             | 0.3544             | 0.1739             |
| 2.5986        | 43.52 | 3400 | 5.4265          | 1.5933            | 1.2407             | 1.0890             | 0.6999             | 0.4719             | 0.2914             | 0.1743             |
| 2.5645        | 46.08 | 3600 | 5.4640          | 1.5893            | 1.2546             | 1.0868             | 0.7156             | 0.4573             | 0.3096             | 0.1809             |
| 2.6286        | 48.64 | 3800 | 5.4074          | 1.5805            | 1.2430             | 1.0898             | 0.6973             | 0.4577             | 0.2949             | 0.1757             |
| 2.5402        | 51.2  | 4000 | 5.4498          | 1.6051            | 1.2551             | 1.0857             | 0.7044             | 0.4704             | 0.2965             | 0.1833             |
| 2.6027        | 53.76 | 4200 | 5.5040          | 1.6330            | 1.2577             | 1.0813             | 0.7198             | 0.5051             | 0.3221             | 0.1834             |
| 2.4852        | 56.32 | 4400 | 5.4356          | 1.5925            | 1.2526             | 1.0858             | 0.7114             | 0.4580             | 0.2926             | 0.1861             |
| 2.4804        | 58.88 | 4600 | 5.4179          | 1.5895            | 1.2417             | 1.0782             | 0.7668             | 0.4488             | 0.2870             | 0.1708             |
| 2.4591        | 61.44 | 4800 | 5.3843          | 1.5925            | 1.2437             | 1.0750             | 0.6884             | 0.4509             | 0.2912             | 0.1708             |
| 2.4773        | 64.0  | 5000 | 5.4038          | 1.5952            | 1.2450             | 1.0797             | 0.6915             | 0.4486             | 0.2933             | 0.1994             |
| 2.4562        | 66.56 | 5200 | 5.3922          | 1.5918            | 1.2485             | 1.0776             | 0.6968             | 0.4479             | 0.2871             | 0.1696             |
| 2.3506        | 69.12 | 5400 | 5.3768          | 1.5882            | 1.2454             | 1.0791             | 0.6869             | 0.4474             | 0.2867             | 0.1710             |
| 2.4044        | 71.68 | 5600 | 5.3605          | 1.5856            | 1.2385             | 1.0739             | 0.6914             | 0.4472             | 0.2856             | 0.1700             |
| 2.3106        | 74.24 | 5800 | 5.4110          | 1.5956            | 1.2418             | 1.0776             | 0.6972             | 0.4813             | 0.2891             | 0.1908             |
| 2.3976        | 76.8  | 6000 | 5.3686          | 1.5894            | 1.2410             | 1.0754             | 0.6877             | 0.4455             | 0.2856             | 0.1685             |
| 2.2507        | 79.36 | 6200 | 5.3727          | 1.5923            | 1.2414             | 1.0760             | 0.6877             | 0.4455             | 0.2852             | 0.1701             |
| 2.3297        | 81.92 | 6400 | 5.3620          | 1.5871            | 1.2407             | 1.0748             | 0.6867             | 0.4443             | 0.2855             | 0.1686             |
| 2.2224        | 84.48 | 6600 | 5.3621          | 1.5881            | 1.2408             | 1.0751             | 0.6865             | 0.4444             | 0.2846             | 0.1687             |
| 2.2312        | 87.04 | 6800 | 5.3594          | 1.5863            | 1.2400             | 1.0735             | 0.6862             | 0.4446             | 0.2846             | 0.1689             |
| 2.2597        | 89.6  | 7000 | 5.3562          | 1.5858            | 1.2387             | 1.0732             | 0.6860             | 0.4440             | 0.2844             | 0.1684             |
| 2.201         | 92.16 | 7200 | 5.3562          | 1.5867            | 1.2387             | 1.0733             | 0.6861             | 0.4438             | 0.2842             | 0.1684             |
| 2.2423        | 94.72 | 7400 | 5.3539          | 1.5862            | 1.2380             | 1.0726             | 0.6856             | 0.4438             | 0.2842             | 0.1686             |
| 2.2145        | 97.28 | 7600 | 5.3546          | 1.5863            | 1.2384             | 1.0728             | 0.6857             | 0.4437             | 0.2842             | 0.1686             |
| 2.2007        | 99.84 | 7800 | 5.3547          | 1.5863            | 1.2384             | 1.0729             | 0.6857             | 0.4438             | 0.2842             | 0.1685             |


### Framework versions

- Transformers 4.43.2
- Pytorch 2.1.2
- Datasets 3.2.0
- Tokenizers 0.19.1