Model Description

This model checkpoint was created by further pre-training XLM-RoBERTa-base on a 1.4B tokens corpus of classical texts mainly written in Ancient Greek, Latin, French, German, English and Italian. The corpus notably contains data from Brill-KIEM, various ancient sources from the Internet Archive, the Corpus Thomisticum, Open Greek and Latin, JSTOR, Persée, Propylaeum, Remacle or Wikipedia. The model can be used as a checkpoint for further pre-training or as a base model for fine-tuning. The model was evaluated on classics-related named-entity recognition and part-of-speech tagging and surpassed XLM-RoBERTa-Base on all task. It also performed significantly better than similar models retrained from scratch on the same corpus.

Downloads last month
11
Safetensors
Model size
278M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for sven-nm/XLM-R-for-classics

Finetuned
(2867)
this model
Finetunes
3 models