--- license: apache-2.0 --- This model is fine tuned with: - The Latin Library - 15M Token - Perseus Project - 15M Token The dataset was cleaned: - Removal of all "pseudo-Latin" text ("Lorem ipsum ..."). - Use of CLTK for sentence splitting and normalisation. - deduplication of the corpus - lowercase all text