Longformer Encoder-Decoder (LED) Fine-tuned for Generative Q&A

This model uses Allenai's Longformer Encoder-Decoder (LED) as base model, which supports very long contexts. It is trained to generate answers to questions based on given contexts with answers of up to 512 tokens.

This model is a fine-tuned version of allenai/led-base-16384 on the long-form question answering (LFQA) dataset stefanbschneider/lfqa-max-answer-length-512.

I used the script led-finetune-lfqa-train.py in this repo to fine-tune the model on a GTX 4070s Ti.

For details, see my blog post: Fine-Tuning a Pre-Trained LLM

Intended uses & limitations

Intended use: Generative/abstractive question answering with potentially very long contexts and multi-sentence answers.

Limitations: Limited training/fine-tuning, i.e., the model tends to ramble and the generated answers do not always make sense.

Training and evaluation data

The model was fine-tuned on stefanbschneider/lfqa-max-answer-length-512. Due to limited resources, I only trained on 50% of the full training set for only one epoch and performed evaluation on a small, fixed subset of the validation set.

It achieves the following results on the subset of the evaluation set:

Loss: 3.2574
Rouge2: 0.0416

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 2
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 1
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Rouge2
3.6685	0.0197	1000	3.5648	0.0353
3.641	0.0395	2000	3.5259	0.0366
3.558	0.0592	3000	3.5224	0.0411
3.6013	0.0789	4000	3.4833	0.0327
3.5962	0.0986	5000	3.4795	0.0349
3.5325	0.1184	6000	3.4863	0.035
3.5618	0.1381	7000	3.4671	0.041
3.5344	0.1578	8000	3.4576	0.0339
3.515	0.1775	9000	3.4483	0.038
3.4672	0.1973	10000	3.4422	0.0343
3.448	0.2170	11000	3.4324	0.0369
3.5145	0.2367	12000	3.4304	0.0353
3.4565	0.2565	13000	3.4169	0.0382
3.4446	0.2762	14000	3.4061	0.0376
3.5298	0.2959	15000	3.3983	0.0368
3.459	0.3156	16000	3.3971	0.0387
3.4825	0.3354	17000	3.3985	0.04
3.3953	0.3551	18000	3.4034	0.0389
3.3849	0.3748	19000	3.3878	0.0345
3.4979	0.3945	20000	3.3890	0.038
3.4667	0.4143	21000	3.3744	0.0381
3.4154	0.4340	22000	3.3882	0.0376
3.4191	0.4537	23000	3.3585	0.0437
3.4372	0.4734	24000	3.3592	0.0395
3.4556	0.4932	25000	3.3557	0.0384
3.4234	0.5129	26000	3.3596	0.0386
3.413	0.5326	27000	3.3565	0.0329
3.3855	0.5524	28000	3.3475	0.0388
3.4496	0.5721	29000	3.3392	0.0372
3.4472	0.5918	30000	3.3332	0.0405
3.4109	0.6115	31000	3.3286	0.0413
3.4177	0.6313	32000	3.3194	0.046
3.4429	0.6510	33000	3.3043	0.0438
3.3835	0.6707	34000	3.2992	0.0411
3.4086	0.6904	35000	3.2984	0.04
3.4113	0.7102	36000	3.2973	0.0393
3.3986	0.7299	37000	3.2920	0.0418
3.3741	0.7496	38000	3.2915	0.0391
3.3473	0.7694	39000	3.2865	0.0434
3.3613	0.7891	40000	3.2776	0.0429
3.3411	0.8088	41000	3.2849	0.0385
3.2708	0.8285	42000	3.2760	0.0411
3.3755	0.8483	43000	3.2715	0.04
3.3551	0.8680	44000	3.2734	0.0363
3.3064	0.8877	45000	3.2678	0.0394
3.2962	0.9074	46000	3.2663	0.0434
3.2761	0.9272	47000	3.2658	0.0421
3.3495	0.9469	48000	3.2626	0.0433
3.3016	0.9666	49000	3.2600	0.0427
3.2545	0.9863	50000	3.2574	0.0416

Framework versions

Transformers 4.48.3
Pytorch 2.5.1+cu121
Datasets 3.2.0
Tokenizers 0.21.0

stefanbschneider
/

led-base-16384-lfqa-ans-len-512