XLM-RoBERTa model pre-trained on 2.5TB of filtered CommonCrawl data containing 100 languages.

This model is fine tuned with The Latin Library - 15M Token

The dataset was cleaned:

Removal of all "pseudo-Latin" text ("Lorem ipsum ...").
Use of CLTK for sentence splitting and normalisation.
deduplication of the corpus
lowercase all text

Downloads last month: 127

Safetensors

Model size

278M params

Tensor type

F32

Inference Providers NEW

Fill-Mask

This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Model tree for Cicciokr/XLM-Roberta-Base-Latin-Uncased

Base model

FacebookAI/xlm-roberta-base

Finetuned

(2720)

this model