🎀 whisper-base-onnx-web-v7

Fine-tuned Whisper model for Swedish transcription, optimized for web deployment with Transformers.js.

πŸ“‹ Model Details

  • Base Model: openai/whisper-base
  • Language: Swedish (sv)
  • Task: Speech Recognition / Transcription
  • Training Steps: N/A
  • License: MIT

πŸš€ Usage with Transformers.js

This model is optimized for browser-based transcription using Transformers.js:

import { pipeline } from '@xenova/transformers';

// Load the model
const transcriber = await pipeline(
  'automatic-speech-recognition',
  'markusingvarsson/whisper-base-onnx-web-v7'
);

// Transcribe audio
const result = await transcriber(audioFile, {
  language: 'sv',
  task: 'transcribe',
  chunk_length_s: 30,
  stride_length_s: 5
});

console.log(result.text);

🐍 Usage with Python

from transformers import pipeline

# Load pipeline
transcriber = pipeline(
    "automatic-speech-recognition",
    model="markusingvarsson/whisper-base-onnx-web-v7",
    device=0  # Use GPU if available
)

# Transcribe
result = transcriber(
    "audio.wav",
    generate_kwargs={"language": "sv", "task": "transcribe"}
)

print(result["text"])

πŸ“Š Performance

  • Word Error Rate (WER): N/A%
  • Model Size (ONNX): ~95MB (quantized)
  • Inference Speed: 1-2x realtime on modern hardware

🎯 Intended Use

This model is designed for:

  • Voice note transcription
  • Meeting transcription
  • Swedish podcast transcription
  • Real-time speech-to-text in web browsers
  • Accessibility applications

πŸ”§ Training Details

  • Hardware: GPU/CPU
  • Batch Size: 8
  • Learning Rate: 1e-5
  • Training Loss: N/A

πŸ“ Model Files

  • *.onnx: ONNX model files for web deployment
  • config.json: Model configuration
  • tokenizer.json: Fast tokenizer for Transformers.js
  • processor_config.json: Audio processing configuration

🌐 Demo

Try the model in your browser: [Coming Soon]

πŸ“ Limitations

  • Optimized for Swedish language only
  • Best performance with clear audio (minimal background noise)
  • May struggle with heavy dialects or very fast speech
  • Maximum audio length: 30 seconds per chunk

🀝 Citation

If you use this model, please cite:

@misc{whisper_base_onnx_web_v7_2024,
  title={whisper-base-onnx-web-v7: Swedish Whisper for Web},
  author={markusingvarsson},
  year={2024},
  publisher={Hugging Face},
  url={https://huggingface.co/markusingvarsson/whisper-base-onnx-web-v7}
}

πŸ™ Acknowledgments

  • OpenAI for the original Whisper model
  • Hugging Face for the tools and platform
  • The Swedish NLP community

πŸ“„ License

This model is released under the MIT License.

Downloads last month
10
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train markusingvarsson/whisper-base-onnx-web-v7

Evaluation results