|
--- |
|
license: mit |
|
datasets: |
|
- BiniyamAjaw/amharic_dataset_v2 |
|
language: |
|
- am |
|
--- |
|
# Amharic Tokenizer |
|
|
|
<!-- The model is trained on a vast amharic data to tokenize unseen data into tokens. --> |
|
|
|
|
|
## Model Details |
|
- **Vocabulary Size:** 100,000 |
|
- **Tokenizer Type:** Byte-Pair Encoder |
|
|
|
### Model Description |
|
|
|
<!-- Tokenizer that uses BPE --> |
|
|
|
|
|
|
|
- **Developed by:** Biniyam Ajaw |
|
- **Language(s) (NLP):** Amharic and Amharic-Driven Languages |
|
- **License:** MIT |
|
|
|
### Model Sources [optional] |
|
|
|
- **Repository:** https://github.com/biniyam69/Amharic-LLM-Finetuning/ |
|
|
|
|
|
## Uses |
|
|
|
Model can be called by the autotokenizer module from the transformers package and can be used to tokenize any amharic text perfectly |