amharic_tokenizer / README.md
BiniyamAjaw's picture
Update README.md
427c9dd verified
---
license: mit
datasets:
- BiniyamAjaw/amharic_dataset_v2
language:
- am
---
# Amharic Tokenizer
<!-- The model is trained on a vast amharic data to tokenize unseen data into tokens. -->
## Model Details
- **Vocabulary Size:** 100,000
- **Tokenizer Type:** Byte-Pair Encoder
### Model Description
<!-- Tokenizer that uses BPE -->
- **Developed by:** Biniyam Ajaw
- **Language(s) (NLP):** Amharic and Amharic-Driven Languages
- **License:** MIT
### Model Sources [optional]
- **Repository:** https://github.com/biniyam69/Amharic-LLM-Finetuning/
## Uses
Model can be called by the autotokenizer module from the transformers package and can be used to tokenize any amharic text perfectly