|
## Setup Notes |
|
|
|
For this model, a VM with 2 T4 GPUs was used. |
|
|
|
Note 1. Output directory was initially lora-alpaca and then contents were moved to new folder when initializing git repository. |
|
|
|
|
|
## Log |
|
|
|
(sqltest) chrisdono@deep-learning-duo-t4-3:~/alpaca-lora$ WORLD_SIZE=2 CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node=2 --master_port=1234 finetune.py --base_model 'decapoda-research/llama-7b-hf' --data_path 'spider' --output_dir './lora-alpaca' --num_epochs 10 --batch_size 32 --micro_batch_size 16 --learning_rate '9e-5' --add_eos_token |
|
|
|
|
|
Adding last loss values not included in trainer json file from last checkpoint. |
|
|
|
{'loss': 0.241, 'learning_rate': 1.0040816326530613e-05, 'epoch': 8.98} |
|
{'loss': 0.2343, 'learning_rate': 9.42857142857143e-06, 'epoch': 9.04} |
|
{'loss': 0.2376, 'learning_rate': 8.816326530612245e-06, 'epoch': 9.11} |
|
{'loss': 0.2355, 'learning_rate': 8.204081632653062e-06, 'epoch': 9.17} |
|
{'loss': 0.229, 'learning_rate': 7.591836734693877e-06, 'epoch': 9.24} |
|
{'loss': 0.2325, 'learning_rate': 6.979591836734694e-06, 'epoch': 9.3} |
|
{'loss': 0.24, 'learning_rate': 6.367346938775511e-06, 'epoch': 9.36} |
|
{'loss': 0.2438, 'learning_rate': 5.755102040816327e-06, 'epoch': 9.43} |
|
{'loss': 0.2391, 'learning_rate': 5.142857142857143e-06, 'epoch': 9.49} |
|
{'loss': 0.2351, 'learning_rate': 4.530612244897959e-06, 'epoch': 9.55} |
|
{'loss': 0.2289, 'learning_rate': 3.9183673469387755e-06, 'epoch': 9.62} |
|
{'loss': 0.2294, 'learning_rate': 3.3061224489795924e-06, 'epoch': 9.68} |
|
{'loss': 0.2344, 'learning_rate': 2.693877551020408e-06, 'epoch': 9.75} |
|
{'loss': 0.2358, 'learning_rate': 2.0816326530612247e-06, 'epoch': 9.81} |
|
{'loss': 0.2365, 'learning_rate': 1.469387755102041e-06, 'epoch': 9.87} |
|
{'loss': 0.2309, 'learning_rate': 8.571428571428572e-07, 'epoch': 9.94} |
|
{'loss': 0.2438, 'learning_rate': 2.4489795918367347e-07, 'epoch': 10.0} |
|
100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1570 |
|
{'train_runtime': 17144.6766, 'train_samples_per_second': 2.916, 'train_steps_per_second': 0.092, 'train_loss': 0.41175747267000234, 'epoch': 10.0} |
|
100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1570 |
|
100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1570 |
|
/1570 [4:45:44<00:00, 10.92s/it] |
|
|