Translation
Transformers
Safetensors
French
Breton
m2m_100
text2text-generation
Inference Endpoints
kellag-m2m100-v0.2 / README.md
amurienne's picture
Duplicate from amurienne/kellag-m2m100
d5b1d1c verified
metadata
license: mit
datasets:
  - Bretagne/ofis_publik_br-fr
  - Bretagne/OpenSubtitles_br_fr
  - Bretagne/Autogramm_Breton_translation
language:
  - fr
  - br
base_model:
  - facebook/m2m100_418M
pipeline_tag: translation
library_name: transformers

Kellag

  • A Breton -> French Translation Model called Kellag.
  • Kellag is the temporary "brother" model of Gallek, since a bidirectional fr <-> br model is not ready yet (WIP).
  • The current model version reached a BLEU score of 50 after 10 epochs on a 20% split of the training set.
  • Only monodirectionally br->fr fine-tuned for now.
  • Training details available on the GweLLM Github repository.

Sample test code:

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline

modelcard = "amurienne/kellag-m2m100"

model = AutoModelForSeq2SeqLM.from_pretrained(modelcard)
tokenizer = AutoTokenizer.from_pretrained(modelcard)

translation_pipeline = pipeline("translation", model=model, tokenizer=tokenizer, src_lang='br', tgt_lang='fr', max_length=512, device="cpu")

breton_text = "treiñ eus ar brezhoneg d'ar galleg: deskiñ a ran brezhoneg er skol."

result = translation_pipeline(breton_text)
print(result[0]['translation_text'])

Demo is available on the Gallek Space