File size: 311 Bytes
f614dbe
 
 
 
 
 
 
 
ecec181
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
---
license: apache-2.0
---
This model is fine tuned with:
- The Latin Library - 15M Token
- Perseus Project - 15M Token

The dataset was cleaned:
- Removal of all "pseudo-Latin" text ("Lorem ipsum ...").
- Use of CLTK for sentence splitting and normalisation.
- deduplication of the corpus
- lowercase all text