Model Description

This model is a fine-tuned version of nllb-200-distilled-600M, specifically adapted for French-Wolof and Wolof-French translation. It was trained using the bilalfaye/english-wolof-french-translation and bilalfaye/english-wolof-french-translation-bis datasets, which underwent significant preprocessing to enhance translation quality.

The model supports bidirectional translation:

  • Wolof to French
  • French to Wolof
  • English to Wolof
  • Wolof to English
  • French to English
  • English to French

Test application on : https://huggingface.co/spaces/bilalfaye/WoFrEn-Translator

How to Use

1. Inference Manually
Install the required library:

!pip install transformers

Python code for translation:

from transformers import NllbTokenizer, AutoModelForSeq2SeqLM
import torch

device = "cuda" if torch.cuda.is_available() else "cpu"
model_load_name = 'bilalfaye/nllb-200-distilled-600M-wo-fr-en'

# Load model and tokenizer
model = AutoModelForSeq2SeqLM.from_pretrained(model_load_name).to(device)
tokenizer = NllbTokenizer.from_pretrained(model_load_name)

def translate(
    text, src_lang='wol_Latn', tgt_lang='french_Latn',
    a=32, b=3, max_input_length=1024, num_beams=4, **kwargs
):
    """Turn a text or a list of texts into a list of translations"""
    tokenizer.src_lang = src_lang
    tokenizer.tgt_lang = tgt_lang
    inputs = tokenizer(
        text, return_tensors='pt', padding=True, truncation=True,
        max_length=max_input_length
    )
    model.eval()
    result = model.generate(
        **inputs.to(model.device),
        forced_bos_token_id=tokenizer.convert_tokens_to_ids(tgt_lang),
        max_new_tokens=int(a + b * inputs.input_ids.shape[1]),
        num_beams=num_beams, **kwargs
    )
    return tokenizer.batch_decode(result, skip_special_tokens=True)

# Example usage
print(translate("Ndax mën nga ko waxaat su la neexee?", src_lang="wol_Latn", tgt_lang="french_Latn")[0])
print(translate("Ndax mën nga ko waxaat su la neexee?", src_lang="wol_Latn", tgt_lang="eng_Latn")[0])
print(translate("Bonjour, où allez-vous?", src_lang="fra_Latn", tgt_lang="wol_Latn")[0])
print(translate("Bonjour, où allez-vous?", src_lang="fra_Latn", tgt_lang="eng_Latn")[0])
print(translate("Hello, how are you?", src_lang="eng_Latn", tgt_lang="wol_Latn")[0])
print(translate("Hello, how are you?", src_lang="eng_Latn", tgt_lang="fr_Latn")[0])

2. Inference with Pipeline
Install the required library:

!pip install transformers

Python code using the pipeline:

from transformers import pipeline

model_name = 'bilalfaye/nllb-200-distilled-600M-wo-fr-en'
device = "cuda" if torch.cuda.is_available() else "cpu"

translator = pipeline("translation", model=model_name, device=device)

print(translator("Ndax mën nga ko waxaat su la neexee?", src_lang="wol_Latn", tgt_lang="fra_Latn")[0]['translation_text'])
print(translator("Bonjour, où allez-vous?", src_lang="fra_Latn", tgt_lang="wol_Latn")[0]['translation_text'])

Package Versions

This model was developed and tested using the following package versions:

  • transformers: 4.41.2
  • torch: 2.4.0+cu121
  • datasets: 3.2.0
  • sentencepiece: 0.2.0
  • sacrebleu: 2.5.1

Author

Bila Faye

Feel free to reach out for questions or improvements!

Downloads last month
20
Safetensors
Model size
615M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for bilalfaye/nllb-200-distilled-600M-wo-fr-en

Finetuned
(102)
this model

Datasets used to train bilalfaye/nllb-200-distilled-600M-wo-fr-en

Space using bilalfaye/nllb-200-distilled-600M-wo-fr-en 1