whisper-hf-rslora

This model is a fine-tuned version of openai/whisper-large-v3-turbo on compulsion/heart-failure-audio. It achieves the following results on the evaluation set:

Loss: 0.6919
Wer: 0.2424

Model description

A PEFT rank-stablized LoRA adapter of whisper-large-v3-turbo finetuned on heart failure audio data that is conversational, longitudinal, and focused on chronic illness management and care coordination in a community-based healthcare setting.

Intended uses & limitations

To be used in ASR tasks specifically in the heart failure domain.

Benchmark (base whisper-large-v3-turbo vs. finetuned rank-stablized LoRA adapter)

Normalized for PHI redactions and throught Transformer's BasicTextNormalizer.

Model	Raw WER (%)	Normalised WER (%)
Baseline	35.00	26.71
rsLoRA	26.18	20.71

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 4
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 16
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_steps: 500
num_epochs: 8
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
2.3062	1.0	92	1.1343	0.2388
1.0317	2.0	184	0.7145	0.2620
0.6833	3.0	276	0.6606	0.2105
0.5934	4.0	368	0.6292	0.2122
0.5104	5.0	460	0.6347	0.2521
0.4392	6.0	552	0.6444	0.2729
0.3653	7.0	644	0.6701	0.2198
0.3178	8.0	736	0.6919	0.2424

Framework versions

PEFT 0.15.2
Transformers 4.52.4
Pytorch 2.6.0+cu124
Datasets 3.6.0
Tokenizers 0.21.1

compulsi0n
/

whisper-hf-rslora