--- language: - en license: mit tags: - whisper - automatic-speech-recognition - speech - audio - transcription - phone-calls - conversational pipeline_tag: automatic-speech-recognition ---
Olib AI Logo # Whisper to Oliver **Fine-tuned Whisper for Real-World Conversational Audio** [![Model on HF](https://img.shields.io/badge/🤗-Model%20on%20HF-yellow.svg)](https://huggingface.co/olib-ai/whisper-to-oliver) [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT) [![Olib AI](https://img.shields.io/badge/🌐-Olib%20AI-green.svg)](https://www.olib.ai)
## 🎯 Model Description **Whisper to Oliver** is a specialized fine-tuned version of OpenAI's `whisper-large-v3-turbo` model, optimized for real-world conversational audio with challenging acoustic conditions. This model is specifically designed to excel at transcribing phone calls and conversations where audio quality may be compromised. ### ✨ Key Features - 🎙️ **Enhanced Performance on Poor Quality Audio**: Fine-tuned on 170K conversational datasets with minor to poor audio quality - 📞 **Phone Call Optimized**: Specifically trained on short conversational segments typical of phone calls - 🚀 **Turbo Performance**: Inherits the speed advantages of whisper-large-v3-turbo - 💼 **Enterprise Ready**: Developed by [Olib AI](https://www.olib.ai) for business applications - 🔧 **FP32 Precision**: Full precision model for maximum accuracy ## 📊 Training Details - **Base Model**: [openai/whisper-large-v3-turbo](https://huggingface.co/openai/whisper-large-v3-turbo) - **Training Dataset**: 170,000 conversational audio samples - **Audio Characteristics**: Minor to poor quality recordings - **Focus**: Short conversational segments typical of phone interactions - **Developer**: [Olib AI](https://www.olib.ai) - Building AI Services for Businesses ## 🚀 Usage ### Using the Transformers Library ```python import torch from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline device = "cuda:0" if torch.cuda.is_available() else "cpu" torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32 model_id = "olib-ai/whisper-to-oliver" model = AutoModelForSpeechSeq2Seq.from_pretrained( model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True ) model.to(device) # Note: This model is in FP32 format processor = AutoProcessor.from_pretrained(model_id) pipe = pipeline( "automatic-speech-recognition", model=model, tokenizer=processor.tokenizer, feature_extractor=processor.feature_extractor, torch_dtype=torch_dtype, device=device, ) # Transcribe audio result = pipe("audio.mp3") print(result["text"]) ``` ### Advanced Usage with Parameters ```python # For better results with phone calls or poor quality audio result = pipe( "phone_call.mp3", chunk_length_s=30, batch_size=16, return_timestamps=True, ) print(result["text"]) ``` ## 📈 Performance Whisper to Oliver shows significant improvements over the base model when dealing with: - 📞 Phone call recordings - 🎙️ Low-quality microphone inputs - 🌐 Conversational speech with background noise - 💬 Short dialogue segments ## 🎯 Intended Use This model is designed for: - Customer service call transcription - Meeting transcription with variable audio quality - Voice assistant applications - Real-time conversation analysis - Accessibility applications for hearing-impaired users ## ⚠️ Limitations and Ethical Considerations Following the ethical guidelines of the base Whisper model: - Should not be used to transcribe recordings without consent - Not recommended for "subjective classification" tasks - Should undergo robust evaluation before deployment in high-risk contexts - May show performance variations across different languages and demographics ## 📜 License This model is released under the **MIT License**, allowing for commercial and non-commercial use with proper attribution. ## 📖 Citation If you use this model in your research or applications, please cite both our work and the original Whisper paper: ```bibtex @misc{whisper-to-oliver, author = {{Olib AI}}, title = {Whisper to Oliver: Fine-tuned Whisper for Real-World Conversational Audio}, year = {2024}, publisher = {Hugging Face}, howpublished = {\url{https://huggingface.co/olib-ai/whisper-to-oliver}}, } @misc{radford2022whisper, doi = {10.48550/ARXIV.2212.04356}, url = {https://arxiv.org/abs/2212.04356}, author = {Radford, Alec and Kim, Jong Wook and Xu, Tao and Brockman, Greg and McLeavey, Christine and Sutskever, Ilya}, title = {Robust Speech Recognition via Large-Scale Weak Supervision}, publisher = {arXiv}, year = {2022}, copyright = {arXiv.org perpetual, non-exclusive license} } ``` ## 👥 About Olib AI [Olib AI](https://www.olib.ai) specializes in building AI services for businesses. Our team focuses on creating practical AI solutions that solve real-world problems. **Contact Us:** - 🌐 Website: [www.olib.ai](https://www.olib.ai) - 📧 Akram H. Sharkar: [akram@olib.ai](mailto:akram@olib.ai) - 📧 Maya M. Sharkar: [maya@olib.ai](mailto:maya@olib.ai) - 💻 GitHub: [https://github.com/Olib-AI](https://github.com/Olib-AI) ---
Built with ❤️ by Olib AI