---
library_name: transformers
tags: []
---
# PersianGemmaTokenizerFast

A fine-tuned Gemma tokenizer on Persian text, optimized to handle the nuances of the Persian language with improved efficiency and accuracy. This tokenizer is available via the Hugging Face Hub as [mshojaei77/PersianGemmaTokenizerFast](https://huggingface.co/mshojaei77/PersianGemmaTokenizerFast).

## Overview

The **PersianGemmaTokenizerFast** leverages the robust architecture of the original Gemma tokenizer and is fine-tuned on Persian data. It is designed to provide faster and more accurate tokenization for various Natural Language Processing (NLP) tasks involving Persian text.

## Features

- **Optimized for Persian:** Tailored tokenization for Persian language constructs.
- **Speed and Efficiency:** Built on fast tokenization libraries for quick processing.
- **Compatibility:** Works seamlessly with the Hugging Face Transformers library.


## Usage

Here is an example of how to use the tokenizer in your Python code:

```python
from transformers import AutoTokenizer

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("mshojaei77/PersianGemmaTokenizerFast")

# Example Persian text
text = "سلام، حال شما چطور است؟"

# Tokenize the text
encoded = tokenizer(text)

# Print token IDs and tokens
print("Token IDs:", encoded["input_ids"])
print("Tokens:", tokenizer.convert_ids_to_tokens(encoded["input_ids"]))
```

## Comparing Performance on a Paragraph of Persian Text

The following image compares the performance of the PersianGemmaTokenizerFast on a paragraph of Persian text, showcasing its efficiency relative to other tokenizers (fewer tokens imply better performance):

![image/png](https://cdn-uploads.huggingface.co/production/uploads/6556b1bb85d43542fa1a8f91/lZJKqsi4BZ8mJiY_I-vhA.png)

## Contributing

Contributions to improve the tokenizer or its documentation are welcome! If you encounter any issues or have suggestions, please feel free to open an issue or submit a pull request.

## License

This project is licensed under the MIT License.