Polyglot-Lion: Efficient Multilingual ASR for Singapore via Balanced Fine-Tuning of Qwen3-ASR
Abstract
Polyglot-Lion, a compact multilingual ASR model family for Singapore's linguistic diversity, achieves competitive performance with significantly reduced training cost and improved inference speed through balanced fine-tuning of pretrained models.
We present Polyglot-Lion, a family of compact multilingual automatic speech recognition (ASR) models tailored for the linguistic landscape of Singapore, covering English, Mandarin, Tamil, and Malay. Our models are obtained by fine-tuning Qwen3-ASR-0.6B and Qwen3-ASR-1.7B exclusively on publicly available speech corpora, using a balanced sampling strategy that equalizes the number of training utterances per language and deliberately omits language-tag conditioning so that the model learns to identify languages implicitly from audio. On 12 benchmarks spanning the four target languages, Polyglot-Lion-1.7B achieves an average error rate of 14.85, competitive with MERaLiON-2-10B-ASR (14.32) - a model 6x larger - while incurring a training cost of \81 on a single RTX PRO 6000 GPU compared to 18,862 for the 128-GPU baseline. Inference throughput is approximately 20x faster than MERaLiON at 0.10 s/sample versus 2.02 s/sample. These results demonstrate that linguistically balanced fine-tuning of moderate-scale pretrained models can yield deployment-ready multilingual ASR at a fraction of the cost of larger specialist systems.
Community
Efficient Multilingual ASR for Singapore
Project Page: https://knoveleng.github.io/polyglot-lion/
Code: https://github.com/knoveleng/polyglot-lion
Models + Datasets: https://huggingface.co/collections/knoveleng/polyglot-lion
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Qwen3-ASR Technical Report (2026)
- Using Songs to Improve Kazakh Automatic Speech Recognition (2026)
- RO-N3WS: Enhancing Generalization in Low-Resource ASR with Diverse Romanian Speech Benchmarks (2026)
- Typhoon ASR Real-time: FastConformer-Transducer for Thai Automatic Speech Recognition (2026)
- VietSuperSpeech: A Large-Scale Vietnamese Conversational Speech Dataset for ASR Fine-Tuning in Chatbot, Customer Support, and Call Center Applications (2026)
- Nw=ach=a Mun=a: A Devanagari Speech Corpus and Proximal Transfer Benchmark for Nepal Bhasha ASR (2026)
- Robust Long-Form Bangla Speech Processing: Automatic Speech Recognition and Speaker Diarization (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Models citing this paper 2
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper