File size: 4,949 Bytes
757067c 1d29de8 53b13f2 4653098 846d4b2 53b13f2 4653098 846d4b2 53b13f2 ee714db 53b13f2 4653098 53b13f2 5cbd1cc 53b13f2 5cbd1cc 53b13f2 fbe80d5 fb75b16 fbe80d5 24312f1 b81566a 24312f1 b81566a b71556b b81566a 53b13f2 1d29de8 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
---
license: mit
datasets:
- Replete-AI/code_bagel
language:
- en
tags:
- code
pipeline_tag: text-generation
---
### Base_model
microsoft/Phi-3-medium-128k-instruct</br>
(https://huggingface.co/microsoft/Phi-3-medium-128k-instruct)
### Datasets
Replete-AI/code_bagel</br>
(https://huggingface.co/datasets/Replete-AI/code_bagel)
### SFT Training Code
https://github.com/hiyouga/LLaMA-Factory
### Train Loss

### Train State
Trainable params: 27852800 || all params: 13988090880 || trainable%: 0.1991</br>
Total Training Duration:69h18m17s
```json
{
"epoch": 0.9999679800589659,
"total_flos": 1.446273483573748e+20,
"train_loss": 0.44412665014957775,
"train_runtime": 249497.725,
"train_samples_per_second": 13.018,
"train_steps_per_second": 0.102
}
```
### Sample inference code
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
torch.random.manual_seed(0)
model_id = "/home/models/phi3/Phi-3-medium-128k-instruct/"
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="cuda",
torch_dtype="auto",
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
messages = [
{"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},
{"role": "assistant", "content": "Sure! Here are some ways to eat bananas and dragonfruits together: 1. Banana and dragonfruit smoothie: Blend bananas and dragonfruits together with some milk and honey. 2. Banana and dragonfruit salad: Mix sliced bananas and dragonfruits together with some lemon juice and honey."},
{"role": "user", "content": "Write a python code to train llm mode by lora and sft ?"},
]
pipe = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
)
generation_args = {
"max_new_tokens": 4096,
"return_full_text": False,
"temperature": 0.0,
"do_sample": False,
}
output = pipe(messages, **generation_args)
print(output[0]['generated_text'])
```
### Outputs by microsoft/Phi-3-medium-128k-instruct
```python
To train a language model using Lora and SFT (Supervised Fine-tuning), you can follow these steps:
1. Install the required libraries:
```python
!pip install transformers
```
2. Import the necessary libraries:
```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, Trainer, TrainingArguments
```
3. Load the pre-trained Lora model and tokenizer:
```python
tokenizer = AutoTokenizer.from_pretrained("microsoft/llama-65b-lora")
model = AutoModelForCausalLM.from_pretrained("microsoft/llama-65b-lora")
```
4. Load the SFT dataset:
```python
# Replace with the path to your SFT dataset
train_dataset =...
```
5. Define the training arguments:
```python
training_args = TrainingArguments(
output_dir="./results",
num_train_epochs=3,
per_device_train_batch_size=4,
per_device_eval_batch_size=4,
warmup_steps=500,
weight_decay=0.01,
logging_dir='./logs',
logging_steps=10,
)
```
6. Create a custom Trainer for Lora and SFT:
```python
class LoraSFTTrainer(Trainer):
def compute_loss(self, model, inputs):
labels = inputs.pop("labels")
outputs = model(**inputs, use_cache=False)
lm_logits = outputs.logits
loss_fct = torch.nn.CrossEntropyLoss()
masked_lm_loss = loss_fct(lm_logits.view(-1, self.model.config.vocab_size), labels.view(-1))
return masked_lm_loss
```
7. Initialize the trainer and train the model:
```python
trainer = LoraSFTTrainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=None,
)
trainer.train()
```
### Outputs by REILX/Phi-3-medium-128k-code-instruct
```python
import torch
from transformers import RobertaForCausalLM, RobertaTokenizer
# Load the model and tokenizer
model = RobertaForCausalLM.from_pretrained('roberta-base')
tokenizer = RobertaTokenizer.from_pretrained('roberta-base')
# Load the data
data = [
"This is a sample sentence.",
"Another sample sentence."
]
# Tokenize the data
input_ids = [tokenizer.encode(sentence, add_special_tokens=True) for sentence in data]
# Train the model
model.train()
for input_id in input_ids:
outputs = model(input_id, labels=input_id)
loss = outputs.loss
loss.backward()
optimizer.step()
# Save the model
model.save_pretrained('my_model')
```
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 1
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- total_eval_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 1200
- num_epochs: 1.0 |