|
--- |
|
datasets: |
|
- SKNahin/bengali-transliteration-data |
|
language: |
|
- bn |
|
- en |
|
base_model: |
|
- facebook/mbart-large-50 |
|
tags: |
|
- banglish |
|
- bangla |
|
- translator |
|
- avro |
|
pipeline_tag: text2text-generation |
|
--- |
|
|
|
# Hugging Face: Banglish to Bangla Translation |
|
|
|
This repository demonstrates how to use a Hugging Face model to translate Banglish (Romanized Bangla) text into Bangla using the MBart50 tokenizer and model. The model, `Mdkaif2782/banglish-to-bangla`, is pre-trained and fine-tuned for this task. |
|
|
|
## Setup in Google Colab |
|
Follow these steps to use the model in Google Colab: |
|
|
|
### 1. Install Dependencies |
|
Make sure you have the `transformers` library installed. Run the following command in your Colab notebook: |
|
|
|
```python |
|
!pip install transformers torch |
|
``` |
|
|
|
### 2. Load and Use the Model |
|
Copy the code below into a cell in your Colab notebook to start translating Banglish to Bangla: |
|
|
|
```python |
|
from transformers import MBartForConditionalGeneration, MBart50TokenizerFast |
|
import torch |
|
|
|
# Load the pre-trained model and tokenizer directly from Hugging Face |
|
model_name = "Mdkaif2782/banglish-to-bangla" |
|
tokenizer = MBart50TokenizerFast.from_pretrained(model_name) |
|
model = MBartForConditionalGeneration.from_pretrained(model_name) |
|
|
|
def translate_banglish_to_bangla(model, tokenizer, banglish_input): |
|
inputs = tokenizer(banglish_input, return_tensors="pt", padding=True, truncation=True, max_length=128) |
|
|
|
if torch.cuda.is_available(): |
|
inputs = {key: value.cuda() for key, value in inputs.items()} |
|
model = model.cuda() |
|
|
|
translated_tokens = model.generate(**inputs, decoder_start_token_id=tokenizer.lang_code_to_id["bn_IN"]) |
|
translated_text = tokenizer.batch_decode(translated_tokens, skip_special_tokens=True)[0] |
|
|
|
return translated_text |
|
|
|
# Take custom input |
|
print("Enter your Banglish text (type 'exit' to quit):") |
|
while True: |
|
banglish_text = input("Banglish: ") |
|
if banglish_text.lower() == "exit": |
|
break |
|
|
|
# Translate Banglish to Bangla |
|
translated_text = translate_banglish_to_bangla(model, tokenizer, banglish_text) |
|
print(f"Translated Bangla: {translated_text}\n") |
|
``` |
|
|
|
### 3. Run the Notebook |
|
1. Paste the above code into a cell. |
|
2. Run the cell. |
|
3. Enter your Banglish text in the input prompt to get the translated Bangla text. Type `exit` to quit. |
|
|
|
## Example Usage |
|
|
|
Input: |
|
``` |
|
Banglish: amar valo lagche onek |
|
``` |
|
|
|
Output: |
|
``` |
|
Translated Bangla: আমার ভালো লাগছে অনেক |
|
``` |
|
|
|
## Notes |
|
- Ensure your runtime in Google Colab supports GPU for faster processing. Go to `Runtime > Change runtime type` and select `GPU`. |
|
- The model `Mdkaif2782/banglish-to-bangla` can be fine-tuned further if required. |
|
|
|
## License |
|
This project uses the Hugging Face `transformers` library. Refer to the [Hugging Face documentation](https://huggingface.co/docs/transformers/) for more details. |