File size: 5,847 Bytes
1e572ed e993bf2 1e572ed e993bf2 1e572ed e993bf2 1e572ed b1d88c7 e993bf2 b1d88c7 e993bf2 4c39838 b1d88c7 e993bf2 b1d88c7 1e572ed b1d88c7 e993bf2 860e9ab e993bf2 4143a94 e993bf2 b1d88c7 e993bf2 b1d88c7 e993bf2 b34ff57 e993bf2 b34ff57 e993bf2 b34ff57 e993bf2 6dc4195 e993bf2 6dc4195 b34ff57 6dc4195 e993bf2 b34ff57 6dc4195 b34ff57 6dc4195 e993bf2 b34ff57 6dc4195 b34ff57 6dc4195 e993bf2 b34ff57 6dc4195 b34ff57 6dc4195 b34ff57 6dc4195 b34ff57 6dc4195 b34ff57 6dc4195 b34ff57 6dc4195 b34ff57 e993bf2 432d350 880533a 45bc0ee |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 |
---
license: apache-2.0
base_model: Helsinki-NLP/opus-mt-sla-sla
pipeline_tag: translation
language:
- pl
- ru
tags:
- translation
- polish-to-russian
- slavic-languages
---
# Model Card: 7-Sky/skyopus-pol-rus
This model, `7-Sky/skyopus-pol-rus`, is a fine-tuned version of the `Helsinki-NLP/opus-mt-sla-sla` model, designed specifically for translating text from **Polish (pl)** to **Russian (ru)**. It is based on the Transformer architecture and uses normalization and SentencePiece tokenization (spm32k) for preprocessing.
## Model Details
- **Source Language**: Polish (`pol`)
- **Target Language**: Russian (`rus`)
- **Base Model**: [Helsinki-NLP/opus-mt-sla-sla](https://github.com/Helsinki-NLP/Tatoeba-Challenge/tree/master/models/sla-sla)
- **Model Type**: Transformer
- **Preprocessing**: Normalization + SentencePiece (spm32k, spm32k)
- **Language Token**: Requires a sentence-initial token in the form `>>rus<<` to specify the target language.
- **Training Date**: 2025-03-10 The model was fine-tuned on a corpus that includes:
- **Training Datasets**:
- Medical terminology (e.g., healthcare and clinical texts)
- Dialogue-based texts (e.g., conversational Polish and Russian)
- Phraseological units (e.g., idioms and fixed expressions)
- Slang vocabulary (e.g., informal and colloquial language)
- Proverbs and sayings (e.g., culturally specific expressions)
This model is part of the broader `sla-sla` family, originally developed for translations between Slavic languages, but this variant is fine-tuned for the specific `pol -> rus` pair.
## Benchmarks
- **chrF2 Score**: 0.672
- **BLEU Score**: 47.6
- **Brevity Penalty**: 1.0
- **Reference Length**: 59,320 tokens
These metrics reflect the model's performance on the Tatoeba-Challenge dataset for Slavic languages.
## How to Use the Model
Below is an example of how to use the model with the `transformers` library in Python. The code supports generating multiple translation variants using beam search.
```python
from transformers import MarianMTModel, MarianTokenizer
# Model name on Hugging Face Hub
model_name = "7-Sky/skyopus-pol-rus"
# Load the tokenizer and model
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)
# Function to translate text from Polish to Russian
def translate_text(source_text, num_translations=3):
# Add the required language token for Russian
text_with_token = ">>rus<< " + source_text
# Tokenize the input text
inputs = tokenizer(text_with_token, return_tensors="pt", padding=True)
# Generate translations with multiple variants
translated_tokens = model.generate(
**inputs,
num_return_sequences=num_translations, # Number of translation variants
num_beams=num_translations, # Use beams for diversity
max_length=512 # Limit output length
)
# Decode the translated tokens into readable text
translations = [tokenizer.decode(tokens, skip_special_tokens=True) for tokens in translated_tokens]
return translations
# Main loop for text input and translation output
print("Enter a Polish phrase to translate into Russian or !q to quit.")
while True:
# Get input phrase from the user
source_text = input("Enter a phrase: ")
# Check for the quit command
if source_text == "!q":
print("Exiting the program.")
break
# Translate the phrase with multiple variants
translations = translate_text(source_text)
if translations:
# Output all translation variants
for idx, translation in enumerate(translations, 1):
print(f"Variant {idx}: {translation}")
# Example Output:
# Enter a Polish phrase to translate into Russian or !q to quit.
# Enter a phrase: Powiedzieć a zrobić to nie to samo.
# Variant 1: Сказать и сделать — не одно и то же.
# Variant 2: Сказать и сделать — это не одно и то же.
# Variant 3: Сказать и сделать — не то же самое.
#
# Enter a phrase: O jego propozycji nawet nie warto mówić.
# Variant 1: О его предложении даже не стоит говорить.
# Variant 2: О его предложении не стоит даже говорить.
# Variant 3: О его предложении и говорить не стоит.
```
## Dear users and language enthusiasts,
Your support has always been the driving force behind innovation, and today, I’m excited to share how you can help take this project to the next level. Together, we’ve built a unique translation model using Marian, trained on a custom dataset that pushes the boundaries of language understanding. But this is just the beginning!
To continue improving the model, expanding the dataset, and ensuring faster, more accurate translations, we need your help. Your contributions will go directly toward:
Enhancing the dataset: Adding more diverse and high-quality data to make the model even smarter.
Acquiring powerful hardware: Training advanced models requires serious computational power, and your support will help us access the resources needed to make this happen.
Every contribution, no matter how small, brings us closer to a future where language barriers are a thing of the past. If you believe in this mission and want to see this project grow, consider supporting us by clicking the button below to Buy Me a Coffee.
Your support isn’t just a donation—it’s an investment in the future of communication. Let’s build something extraordinary together!
<a href="https://buycoffee.to/skyweb117" target="_blank"><img src="https://buycoffee.to/img/share-button-primary.png" style="width: 166px; height: 43px" alt="Postaw mi kawę na buycoffee.to"></a>
|