sven-nm commited on
Commit
287f46d
·
verified ·
1 Parent(s): fbf01c3

Updates README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -3
README.md CHANGED
@@ -1,3 +1,22 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - la
5
+ - el
6
+ - fr
7
+ - en
8
+ - de
9
+ - it
10
+ base_model:
11
+ - FacebookAI/xlm-roberta-base
12
+ ---
13
+
14
+ # Model Description
15
+
16
+ <!-- Provide a quick summary of what the model is/does. -->
17
+
18
+ This model checkpoint was created by further pre-training XLM-RoBERTa-base on a 1.4B tokens corpus of classical texts mainly written in Ancient Greek, Latin, French, German, English and Italian.
19
+ The corpus notably contains data from [Brill-KIEM](https://github.com/kiem-group/pdfParser), various ancient sources from the Internet Archive, the [Corpus Thomisticum](https://www.corpusthomisticum.org/), [Open Greek and Latin](https://www.opengreekandlatin.org/), [JSTOR](https://about.jstor.org/whats-in-jstor/text-mining-support/), [Persée](https://www.persee.fr/), Propylaeum, [Remacle](https://remacle.org/) or Wikipedia.
20
+ The model can be used as a checkpoint for further pre-training or as a base model for fine-tuning.
21
+ The model was evaluated on classics-related named-entity recognition and part-of-speech tagging and surpassed XLM-RoBERTa-Base on all task.
22
+ It also performed significantly better than similar models retrained from scratch on the same corpus.