tangled-0.5-0.5b-base

logo

time python -B prepare_core_datasets.py
Progress: 100%|████████| 220/220 [23:15<00:00,  6.34s/it]
Workers are finished.██| 220/220 [23:15<00:00,  6.34s/it]
Finished data processing!
i=0, block_size=8192, chunk_size=16384000, len(dataset)=893355, len(dataset) * block_size=7318364160
Total number of tokens in the optimized dataset '../core-data-0-8192-2000' is 7318364160
CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True litgpt pretrain --config pretrain-core-model.yaml
Seed set to 23
Time to instantiate model: 0.24 seconds.
Total parameters: 138,084,864
Verifying settings ...
Measured TFLOPs: 6972.54
Epoch 1 | iter 256 step 1 | loss train: 10.530, val: n/a | iter time: 602.41 ms (step) remaining time: 3 days, 12:23:38
Epoch 1 | iter 512 step 2 | loss train: 10.411, val: n/a | iter time: 555.44 ms (step) remaining time: 3 days, 3:42:13
Epoch 1 | iter 768 step 3 | loss train: 10.127, val: n/a | iter time: 555.03 ms (step) remaining time: 3 days, 0:46:23
Epoch 1 | iter 1024 step 4 | loss train: 9.888, val: n/a | iter time: 550.17 ms (step) remaining time: 2 days, 23:17:08
Epoch 1 | iter 1280 step 5 | loss train: 9.696, val: n/a | iter time: 556.48 ms (step) remaining time: 2 days, 22:22:33
Epoch 1 | iter 1536 step 6 | loss train: 9.532, val: n/a | iter time: 555.13 ms (step) remaining time: 2 days, 21:45:36
Epoch 1 | iter 1792 step 7 | loss train: 9.383, val: n/a | iter time: 555.15 ms (step) remaining time: 2 days, 21:18:36
Epoch 1 | iter 2048 step 8 | loss train: 9.293, val: n/a | iter time: 558.12 ms (step) remaining time: 2 days, 20:57:32
Epoch 1 | iter 2304 step 9 | loss train: 9.142, val: n/a | iter time: 556.93 ms (step) remaining time: 2 days, 20:40:40
Epoch 1 | iter 2560 step 10 | loss train: 9.101, val: n/a | iter time: 551.04 ms (step) remaining time: 2 days, 20:26:43
Epoch 1 | iter 2816 step 11 | loss train: 8.990, val: n/a | iter time: 553.37 ms (step) remaining time: 2 days, 20:14:53
Epoch 1 | iter 3072 step 12 | loss train: 8.943, val: n/a | iter time: 550.41 ms (step) remaining time: 2 days, 20:04:39
Epoch 1 | iter 3328 step 13 | loss train: 8.877, val: n/a | iter time: 553.59 ms (step) remaining time: 2 days, 19:55:42
Epoch 1 | iter 3584 step 14 | loss train: 8.816, val: n/a | iter time: 553.64 ms (step) remaining time: 2 days, 19:47:46
Epoch 1 | iter 3840 step 15 | loss train: 8.710, val: n/a | iter time: 554.96 ms (step) remaining time: 2 days, 19:40:44
Epoch 1 | iter 4096 step 16 | loss train: 8.639, val: n/a | iter time: 553.51 ms (step) remaining time: 2 days, 19:34:12
Epoch 1 | iter 4352 step 17 | loss train: 8.535, val: n/a | iter time: 555.68 ms (step) remaining time: 2 days, 19:28:11
Epoch 1 | iter 4608 step 18 | loss train: 8.515, val: n/a | iter time: 553.87 ms (step) remaining time: 2 days, 19:22:34
Epoch 1 | iter 4864 step 19 | loss train: 8.452, val: n/a | iter time: 555.41 ms (step) remaining time: 2 days, 19:17:18
Epoch 1 | iter 5120 step 20 | loss train: 8.415, val: n/a | iter time: 554.44 ms (step) remaining time: 2 days, 19:12:23
# ...
Validating ...
iter 25600: val loss 5.6002, val time: 19943.79 ms
Epoch 1 | iter 25856 step 101 | loss train: 5.227, val: 5.600 | iter time: 553.09 ms (step) remaining time: 2 days, 15:24:37
Epoch 1 | iter 26112 step 102 | loss train: 5.249, val: 5.600 | iter time: 554.72 ms (step) remaining time: 2 days, 15:22:13
Epoch 1 | iter 26368 step 103 | loss train: 5.171, val: 5.600 | iter time: 553.12 ms (step) remaining time: 2 days, 15:19:49
Epoch 1 | iter 26624 step 104 | loss train: 5.163, val: 5.600 | iter time: 553.89 ms (step) remaining time: 2 days, 15:17:24
Epoch 1 | iter 26880 step 105 | loss train: 5.154, val: 5.600 | iter time: 554.35 ms (step) remaining time: 2 days, 15:14:59
Epoch 1 | iter 27136 step 106 | loss train: 5.146, val: 5.600 | iter time: 553.58 ms (step) remaining time: 2 days, 15:12:33
Epoch 1 | iter 27392 step 107 | loss train: 5.132, val: 5.600 | iter time: 553.26 ms (step) remaining time: 2 days, 15:10:07
Epoch 1 | iter 27648 step 108 | loss train: 5.065, val: 5.600 | iter time: 555.73 ms (step) remaining time: 2 days, 15:07:42
Epoch 1 | iter 27904 step 109 | loss train: 5.033, val: 5.600 | iter time: 556.20 ms (step) remaining time: 2 days, 15:05:16
Epoch 1 | iter 28160 step 110 | loss train: 5.061, val: 5.600 | iter time: 554.23 ms (step) remaining time: 2 days, 15:02:51
Epoch 1 | iter 28416 step 111 | loss train: 5.011, val: 5.600 | iter time: 554.38 ms (step) remaining time: 2 days, 15:00:27
Epoch 1 | iter 28672 step 112 | loss train: 5.039, val: 5.600 | iter time: 553.23 ms (step) remaining time: 2 days, 14:58:03
Epoch 1 | iter 28928 step 113 | loss train: 4.979, val: 5.600 | iter time: 554.73 ms (step) remaining time: 2 days, 14:55:39
Epoch 1 | iter 29184 step 114 | loss train: 5.003, val: 5.600 | iter time: 555.53 ms (step) remaining time: 2 days, 14:53:15
Epoch 1 | iter 29440 step 115 | loss train: 4.982, val: 5.600 | iter time: 550.28 ms (step) remaining time: 2 days, 14:50:51
Epoch 1 | iter 29696 step 116 | loss train: 4.995, val: 5.600 | iter time: 552.56 ms (step) remaining time: 2 days, 14:48:27
Epoch 1 | iter 29952 step 117 | loss train: 4.945, val: 5.600 | iter time: 553.72 ms (step) remaining time: 2 days, 14:46:04
Epoch 1 | iter 30208 step 118 | loss train: 4.961, val: 5.600 | iter time: 555.42 ms (step) remaining time: 2 days, 14:43:45
Epoch 1 | iter 30464 step 119 | loss train: 4.824, val: 5.600 | iter time: 555.06 ms (step) remaining time: 2 days, 14:41:17
Epoch 1 | iter 30720 step 120 | loss train: 4.830, val: 5.600 | iter time: 555.66 ms (step) remaining time: 2 days, 14:38:49
# ...

Backup wandb:

mv wandb wandb-pretrain-core

Chat with model:

CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True litgpt chat ../out/pretrain-core/final
CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True time litgpt evaluate --tasks 'leaderboard' --out_dir '../evaluate/pretrain-core/leaderboard/' --batch_size 1 --dtype 'bfloat16' '../out/pretrain-core/final'

litgpt convert_pretrained_checkpoint ../out/pretrain-core/final ../out/pretrain-core-converted
litgpt convert_from_litgpt ../out/pretrain-core-converted/ ../out/pretrain-core-converted
cp ../evaluate/pretrain-core/leaderboard/pytorch_model.bin ../out/pretrain-core-converted
mergekit-yaml merge-core-into-base.yaml ../out/pretrain-base-converted --clone-tensors
litgpt convert_to_litgpt --model_name "Qwen2.5-0.5B" --dtype bfloat16 ../out/pretrain-base-converted/
CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True litgpt chat ../out/pretrain-base-converted/
cp -r ../out/pretrain-base-converted/ ../out/pretrain-base
rm ../out/pretrain-base/lit_model.pth ../out/pretrain-base/mergekit_config.yml ../out/pretrain-base/model_config.yaml
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Datasets used to train tangledgroup/tangled-0.5-0.5b-base