metadata

license: mit
base_model:
  - unsloth/phi-4
library_name: transformers

WAIDWML - What Am I Doing With My Life?

(8 Phi-4s in a trenchcoat)

Rationale

So there I was, finding some inspiration to tune stuff but lacking the disposable funds to do anything with the larger models. Enter Phi-4, a model designed for productivity... Initially it was just a going to be a sequential series of finetunes, starting from the baseline Phi-4 and gradually adding more datasets until I either got bored or it got good, but then I had an idea; what if I just MoE'd it?

Yeah.

As a proof of concept, this wasn't too bad. The end result is... interesting, to say the least.

Training

As mentioned above, this was done in "phases", each with a separate dataset. Most were done with a max_seq_length of 32k, a few of them were dropped to 16k to make sure they fit in the hardware.

lr was all over the place but in general somewhere between 1e-5 and 4e-6. These were all separate LoRAs using r=64 and alpha=32 with rsLoRA enabled. epochs were 2 or 3 for everything except c2, as that'd take far too long.

p1: Private RP dataset (RPT-Varied-Small)
p2: TheDrummer/AmoralQA-v2
p3: AIRRC/Eudaimonic
p4: Two private RP datasets (cc-gpt4-sfw-sharegpt & cc-gpt4-nsfw-sharegpt)
p5: A random subset of the infamous "c2"-logs dataset, cleaned and deduped (approx. 30%)
p6: Private RP dataset (RPT-Varied-Small_v1.5)
p7: NewEden/PIPPA-Mega-Filtered
p8: Squish42/bluemoon-fandom-1-1-rp-cleaned

(Note: the RPT-Varied-Small and RPT-Varied-Small_v1.5 datasets are due to be released after I manually verify their fitness.)

Once all LoRAs were trained, I separately merged them into the base model then I used mergekit (config) to "merge" them into a MoE. I chose to initialize the router randomly as I was going to training that part later. After that, I trained the routing layers for 8 epochs with lr = 1e-6 and grimulkan/LimaRP-augmented as the dataset. It took roughly 8.5 hours on a 6xA40 instance on RunPod.

Recommended Settings

Phi-4 format. What I used for my tests:

Temp 1
minP 0.05

FAQ

Q: Why not do anything constructive, like GRPO-tune a model of usable size?
A: Where's the fun in that?

Q: Are you, like, okay?
A: Objectively? Probably not. Subjectively? Never better.

Q: You know this still sucks for RP, right?
A: Yup. Should have pivoted to reasoning and code once R1 hit, but sunk cost and all kept me on this trajectory.