rAIfle commited on
Commit
0d28f16
·
verified ·
1 Parent(s): 09e70cb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -31,7 +31,7 @@ As mentioned above, this was done in "phases", each with a separate dataset. Mos
31
 
32
  (Note: the `RPT-Varied-Small` and `RPT-Varied-Small_v1.5` datasets are due to be released after I manually verify their fitness.)
33
 
34
- Once all LoRAs were trained, I separately merged them into the base model then I used [mergekit](https://github.com/arcee-ai/mergekit) to "merge" them into a MoE. I chose to initialize the router randomly as I was fully intent on training that part later. After that, I trained the routing layers for 8 epochs with `lr = 1e-6` and `grimulkan/LimaRP-augmented` as the dataset. It took roughly 8.5 hours on a 6xA40 instance on RunPod.
35
 
36
  ## Recommended Settings
37
  Phi-4 format.
 
31
 
32
  (Note: the `RPT-Varied-Small` and `RPT-Varied-Small_v1.5` datasets are due to be released after I manually verify their fitness.)
33
 
34
+ Once all LoRAs were trained, I separately merged them into the base model then I used [mergekit](https://github.com/arcee-ai/mergekit) [(config)](https://huggingface.co/rAIfle/WAIDWML-Phi4-8x14B-bf16/blob/main/mergekit_moe_config.yml) to "merge" them into a MoE. I chose to initialize the router randomly as I was going to training that part later. After that, I trained the routing layers for 8 epochs with `lr = 1e-6` and `grimulkan/LimaRP-augmented` as the dataset. It took roughly 8.5 hours on a 6xA40 instance on RunPod.
35
 
36
  ## Recommended Settings
37
  Phi-4 format.