GameRuiner's picture
initial commit
85d2392
metadata
license: apache-2.0
base_model: HKUSTAudio/Llasa-1B-Multilingual
datasets:
  - amu-cai/pl-asr-bigos-v2
language:
  - pl
tags:
  - speech
  - audio
  - polish
  - llama
  - tts
  - fine-tuned
  - text-to-speech
model-index:
  - name: From Llasa to Łazanki
    results: []

From Llasa to Łazanki: Fine-tuned Llasa-1B on Polish Speech

This is a fine-tuned version of HKUSTAudio/Llasa-1B-Multilingual, adapted for Polish Text-to-Speech (TTS).
It was fine-tuned on the pl-asr-bigos-v2 dataset, specifically the mozilla-common_voice_15-23 subset, which includes high-quality Polish speech recordings suitable for training TTS models.


🧠 Base Model

Llasa-1B-Multilingual model developed by HKUST. The approach leverages the LLAMA-initialized text BPE tokenizer, which can handle multilingual text without the need to design language-specific G2P (grapheme-to-phoneme) systems.


🗣 Fine-tuning Details

  • Dataset: PL-ASR-BIGOS-v2, mozilla-common_voice_15-23 subset
  • Language: 🇵🇱 Polish
  • Task: Text to speech