Cicciokr
/

Roberta-Base-Latin-Uncased-V2

Model card Files Files and versions Community

Roberta-Base-Latin-Uncased-V2 / README.md

Cicciokr's picture

Update README.md

ecec181 verified 22 days ago

|

history blame contribute delete

311 Bytes

	---
	license: apache-2.0
	---
	This model is fine tuned with:
	- The Latin Library - 15M Token
	- Perseus Project - 15M Token

	The dataset was cleaned:
	- Removal of all "pseudo-Latin" text ("Lorem ipsum ...").
	- Use of CLTK for sentence splitting and normalisation.
	- deduplication of the corpus
	- lowercase all text