stojchet
/

lr_sft2

+---
+base_model: deepseek-ai/deepseek-coder-1.3b-base
+datasets:
+- generator
+library_name: peft
+license: other
+tags:
+- trl
+- sft
+- generated_from_trainer
+model-index:
+- name: lr_sft2
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/stojchets/huggingface/runs/lr_sft2)
+# lr_sft2
+This model is a fine-tuned version of [deepseek-ai/deepseek-coder-1.3b-base](https://huggingface.co/deepseek-ai/deepseek-coder-1.3b-base) on the generator dataset.
+It achieves the following results on the evaluation set:
+- Loss: 1.2345
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 1.41e-06
+- train_batch_size: 8
+- eval_batch_size: 8
+- seed: 42
+- gradient_accumulation_steps: 16
+- total_train_batch_size: 128
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: linear
+- num_epochs: 1
+### Training results
+| Training Loss | Epoch  | Step | Validation Loss |
+|:-------------:|:------:|:----:|:---------------:|
+| 1.2662        | 0.0128 | 1    | 1.2399          |
+| 1.2265        | 0.0256 | 2    | 1.2398          |
+| 1.2592        | 0.0384 | 3    | 1.2396          |
+| 1.1588        | 0.0512 | 4    | 1.2395          |
+| 1.2261        | 0.064  | 5    | 1.2393          |
+| 1.2145        | 0.0768 | 6    | 1.2392          |
+| 1.2194        | 0.0896 | 7    | 1.2390          |
+| 1.2688        | 0.1024 | 8    | 1.2389          |
+| 1.2326        | 0.1152 | 9    | 1.2388          |
+| 1.2506        | 0.128  | 10   | 1.2386          |
+| 1.2719        | 0.1408 | 11   | 1.2385          |
+| 1.2007        | 0.1536 | 12   | 1.2384          |
+| 1.1761        | 0.1664 | 13   | 1.2383          |
+| 1.2937        | 0.1792 | 14   | 1.2382          |
+| 1.2277        | 0.192  | 15   | 1.2381          |
+| 1.2658        | 0.2048 | 16   | 1.2379          |
+| 1.2467        | 0.2176 | 17   | 1.2378          |
+| 1.258         | 0.2304 | 18   | 1.2377          |
+| 1.2024        | 0.2432 | 19   | 1.2376          |
+| 1.2011        | 0.256  | 20   | 1.2375          |
+| 1.2371        | 0.2688 | 21   | 1.2374          |
+| 1.2095        | 0.2816 | 22   | 1.2373          |
+| 1.2481        | 0.2944 | 23   | 1.2372          |
+| 1.2934        | 0.3072 | 24   | 1.2371          |
+| 1.2088        | 0.32   | 25   | 1.2370          |
+| 1.2565        | 0.3328 | 26   | 1.2369          |
+| 1.2254        | 0.3456 | 27   | 1.2368          |
+| 1.2002        | 0.3584 | 28   | 1.2367          |
+| 1.1977        | 0.3712 | 29   | 1.2366          |
+| 1.1858        | 0.384  | 30   | 1.2366          |
+| 1.1915        | 0.3968 | 31   | 1.2365          |
+| 1.22          | 0.4096 | 32   | 1.2364          |
+| 1.2649        | 0.4224 | 33   | 1.2363          |
+| 1.2383        | 0.4352 | 34   | 1.2362          |
+| 1.1996        | 0.448  | 35   | 1.2361          |
+| 1.1884        | 0.4608 | 36   | 1.2361          |
+| 1.2159        | 0.4736 | 37   | 1.2360          |
+| 1.2392        | 0.4864 | 38   | 1.2359          |
+| 1.272         | 0.4992 | 39   | 1.2359          |
+| 1.2083        | 0.512  | 40   | 1.2358          |
+| 1.2369        | 0.5248 | 41   | 1.2357          |
+| 1.2324        | 0.5376 | 42   | 1.2357          |
+| 1.1785        | 0.5504 | 43   | 1.2356          |
+| 1.2122        | 0.5632 | 44   | 1.2355          |
+| 1.2011        | 0.576  | 45   | 1.2355          |
+| 1.2412        | 0.5888 | 46   | 1.2354          |
+| 1.187         | 0.6016 | 47   | 1.2353          |
+| 1.2275        | 0.6144 | 48   | 1.2353          |
+| 1.2167        | 0.6272 | 49   | 1.2352          |
+| 1.2042        | 0.64   | 50   | 1.2352          |
+| 1.239         | 0.6528 | 51   | 1.2351          |
+| 1.1876        | 0.6656 | 52   | 1.2351          |
+| 1.2362        | 0.6784 | 53   | 1.2350          |
+| 1.2018        | 0.6912 | 54   | 1.2350          |
+| 1.1839        | 0.704  | 55   | 1.2350          |
+| 1.2025        | 0.7168 | 56   | 1.2349          |
+| 1.2289        | 0.7296 | 57   | 1.2349          |
+| 1.2228        | 0.7424 | 58   | 1.2348          |
+| 1.1969        | 0.7552 | 59   | 1.2348          |
+| 1.2393        | 0.768  | 60   | 1.2348          |
+| 1.2783        | 0.7808 | 61   | 1.2347          |
+| 1.2625        | 0.7936 | 62   | 1.2347          |
+| 1.1973        | 0.8064 | 63   | 1.2347          |
+| 1.2449        | 0.8192 | 64   | 1.2346          |
+| 1.1992        | 0.832  | 65   | 1.2346          |
+| 1.1581        | 0.8448 | 66   | 1.2346          |
+| 1.2901        | 0.8576 | 67   | 1.2346          |
+| 1.1731        | 0.8704 | 68   | 1.2346          |
+| 1.1956        | 0.8832 | 69   | 1.2345          |
+| 1.1748        | 0.896  | 70   | 1.2345          |
+| 1.2399        | 0.9088 | 71   | 1.2345          |
+| 1.2649        | 0.9216 | 72   | 1.2345          |
+| 1.2461        | 0.9344 | 73   | 1.2345          |
+| 1.1934        | 0.9472 | 74   | 1.2345          |
+| 1.2389        | 0.96   | 75   | 1.2345          |
+| 1.2689        | 0.9728 | 76   | 1.2345          |
+| 1.2085        | 0.9856 | 77   | 1.2345          |
+| 1.226         | 0.9984 | 78   | 1.2345          |
+### Framework versions
+- PEFT 0.10.0
+- Transformers 4.43.0.dev0
+- Pytorch 2.2.2+cu121
+- Datasets 2.19.2
+- Tokenizers 0.19.1