File size: 4,949 Bytes
757067c
 
 
 
 
 
 
 
1d29de8
53b13f2
 
 
4653098
846d4b2
53b13f2
 
4653098
846d4b2
53b13f2
ee714db
 
 
53b13f2
 
 
 
4653098
53b13f2
5cbd1cc
53b13f2
 
 
 
 
 
 
 
5cbd1cc
53b13f2
fbe80d5
 
 
 
 
 
fb75b16
fbe80d5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24312f1
b81566a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24312f1
b81566a
b71556b
b81566a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53b13f2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1d29de8
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
---
license: mit
datasets:
- Replete-AI/code_bagel
language:
- en
tags:
- code
pipeline_tag: text-generation
---

### Base_model
microsoft/Phi-3-medium-128k-instruct</br>
(https://huggingface.co/microsoft/Phi-3-medium-128k-instruct)

### Datasets
Replete-AI/code_bagel</br>
(https://huggingface.co/datasets/Replete-AI/code_bagel)

### SFT Training Code
https://github.com/hiyouga/LLaMA-Factory

### Train Loss
![image/png](https://cdn-uploads.huggingface.co/production/uploads/636f54b95d2050767e4a6317/tOBahj5rDAJzqCmftVdkX.png)

### Train State
Trainable params: 27852800 || all params: 13988090880 || trainable%: 0.1991</br>
Total Training Duration:69h18m17s
```json
{
    "epoch": 0.9999679800589659,
    "total_flos": 1.446273483573748e+20,
    "train_loss": 0.44412665014957775,
    "train_runtime": 249497.725,
    "train_samples_per_second": 13.018,
    "train_steps_per_second": 0.102
}
```

### Sample inference code
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

torch.random.manual_seed(0)
model_id = "/home/models/phi3/Phi-3-medium-128k-instruct/"
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="cuda", 
    torch_dtype="auto", 
    trust_remote_code=True, 
)
tokenizer = AutoTokenizer.from_pretrained(model_id)

messages = [
    {"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},
    {"role": "assistant", "content": "Sure! Here are some ways to eat bananas and dragonfruits together: 1. Banana and dragonfruit smoothie: Blend bananas and dragonfruits together with some milk and honey. 2. Banana and dragonfruit salad: Mix sliced bananas and dragonfruits together with some lemon juice and honey."},
    {"role": "user", "content": "Write a python code to train llm mode by lora and sft ?"},
]

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
)

generation_args = {
    "max_new_tokens": 4096,
    "return_full_text": False,
    "temperature": 0.0,
    "do_sample": False,
}

output = pipe(messages, **generation_args)
print(output[0]['generated_text'])

```

### Outputs by microsoft/Phi-3-medium-128k-instruct
```python
 To train a language model using Lora and SFT (Supervised Fine-tuning), you can follow these steps:

1. Install the required libraries:

```python
!pip install transformers
```

2. Import the necessary libraries:

```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, Trainer, TrainingArguments
```

3. Load the pre-trained Lora model and tokenizer:

```python
tokenizer = AutoTokenizer.from_pretrained("microsoft/llama-65b-lora")
model = AutoModelForCausalLM.from_pretrained("microsoft/llama-65b-lora")
```

4. Load the SFT dataset:

```python
# Replace with the path to your SFT dataset
train_dataset =...
```

5. Define the training arguments:

```python
training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=3,
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    warmup_steps=500,
    weight_decay=0.01,
    logging_dir='./logs',
    logging_steps=10,
)
```

6. Create a custom Trainer for Lora and SFT:

```python
class LoraSFTTrainer(Trainer):
    def compute_loss(self, model, inputs):
        labels = inputs.pop("labels")
        outputs = model(**inputs, use_cache=False)
        lm_logits = outputs.logits
        loss_fct = torch.nn.CrossEntropyLoss()
        masked_lm_loss = loss_fct(lm_logits.view(-1, self.model.config.vocab_size), labels.view(-1))
        return masked_lm_loss
```

7. Initialize the trainer and train the model:

```python
trainer = LoraSFTTrainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=None,
)

trainer.train()
```


### Outputs by REILX/Phi-3-medium-128k-code-instruct
```python
import torch
from transformers import RobertaForCausalLM, RobertaTokenizer

# Load the model and tokenizer
model = RobertaForCausalLM.from_pretrained('roberta-base')
tokenizer = RobertaTokenizer.from_pretrained('roberta-base')

# Load the data
data = [
    "This is a sample sentence.",
    "Another sample sentence."
]

# Tokenize the data
input_ids = [tokenizer.encode(sentence, add_special_tokens=True) for sentence in data]

# Train the model
model.train()
for input_id in input_ids:
    outputs = model(input_id, labels=input_id)
    loss = outputs.loss
    loss.backward()
    optimizer.step()

# Save the model
model.save_pretrained('my_model')

```

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 1
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- total_eval_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 1200
- num_epochs: 1.0