An 11B T5 model trained on the P3 (T0 split) dataset for 20,000 steps with a batch size of 2048 a maximum input sequence length of 1024, a maximum output sequence length of 256, and the Adafactor optimizer with a constant learning rate of 0.001. The model is trained from the T5 v1.1 lm-adapt checkpoint and fully finetuned.

For more details, see HINT: Hypernetwork Instruction Tuning for Efficient Zero- & Few-Shot Generalisation.

Performance on T0 held-out tasks (average accuracy across prompts using rank classification):

Model ANLI (avg) HellaSwag StoryCloze CB COPA RTE WiC WSC WinoGrande Average
T0-11B 41.0 33.6 92.4 70.1 91.5 81.0 56.1 61.1 59.9 65.2
hypertask_T0_11B (this model) 46.8 34.1 98.2 81.2 96.6 84.0 52.1 62.6 64.8 68.9
Downloads last month
10
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Dataset used to train hamishivi/hypertask_T0_11B