Cicciokr
/

Roberta-Base-Latin-Uncased-V2

Model card Files Files and versions Community

Cicciokr commited on 11 days ago

Commit

ecec181

·

verified ·

1 Parent(s): f614dbe

Update README.md

Files changed (1) hide show

README.md +4 -5

README.md CHANGED Viewed

@@ -6,8 +6,7 @@ This model is fine tuned with:
 - Perseus Project - 15M Token
 The dataset was cleaned:
-Removal of all "pseudo-Latin" text ("Lorem ipsum ...").
-Use of CLTK for sentence splitting and normalisation.
-deduplication of the corpus
-lowercase all text

 - Perseus Project - 15M Token
 The dataset was cleaned:
+- Removal of all "pseudo-Latin" text ("Lorem ipsum ...").
+- Use of CLTK for sentence splitting and normalisation.
+- deduplication of the corpus
+- lowercase all text