7-Sky
/

skyopus-pol-rus

@@ -1,63 +1,69 @@
 ---
 license: apache-2.0
-base_model:
-- Helsinki-NLP/opus-mt-sla-sla
 pipeline_tag: translation
 language:
-- pl
-- ru
 tags:
-- translation
 ---
-### sla-sla
-* source group: Slavic languages
-* target group: Slavic languages
-*  OPUS readme: [sla-sla](https://github.com/Helsinki-NLP/Tatoeba-Challenge/tree/master/models/sla-sla/README.md)
-*  model: transformer
-* source language(s): pol
-* target language(s): rus
-* model: transformer
-* pre-processing: normalization + SentencePiece (spm32k,spm32k)
-* a sentence initial language token is required in the form of `>>id<<` (id = valid target language ID)
-* download original weights: [opus-2020-07-27.zip](https://object.pouta.csc.fi/Tatoeba-MT-models/sla-sla/opus-2020-07-27.zip)
-* test set translations: [opus-2020-07-27.test.txt](https://object.pouta.csc.fi/Tatoeba-MT-models/sla-sla/opus-2020-07-27.test.txt)
-* test set scores: [opus-2020-07-27.eval.txt](https://object.pouta.csc.fi/Tatoeba-MT-models/sla-sla/opus-2020-07-27.eval.txt)
 ## Benchmarks
-## Run the model
 ```python
 from transformers import MarianMTModel, MarianTokenizer
-# Paths to the model and tokenizer
-model_path = "7-Sky/skyopus-pol-rus"
-# Load the model and tokenizer
-tokenizer = MarianTokenizer.from_pretrained(model_path)
-model = MarianMTModel.from_pretrained(model_path)
-# Function to translate text with multiple variants (Russian only)
 def translate_text(source_text, num_translations=3):
-    # Add the fixed language token for Russian
     text_with_token = ">>rus<< " + source_text
     # Tokenize the input text
-    inputs = tokenizer(text_with_token, return_tensors="pt")
     # Generate translations with multiple variants
     translated_tokens = model.generate(
         **inputs,
         num_return_sequences=num_translations,  # Number of translation variants
-        num_beams=num_translations  # Use multiple beams for diversity
     )
     # Decode the translated tokens into readable text
@@ -65,7 +71,7 @@ def translate_text(source_text, num_translations=3):
     return translations
 # Main loop for text input and translation output
-print("Enter a phrase to translate or !q to quit.")
 while True:
     # Get input phrase from the user
@@ -84,57 +90,17 @@ while True:
         for idx, translation in enumerate(translations, 1):
             print(f"Variant {idx}: {translation}")
-    # Output
-    # Enter a phrase to translate or !q to quit.
-    # Enter a phrase: Powiedzieć a zrobić to nie to samo.
-    # Variant 1: >>rus<< Сказать и сделать - не одно и то же.
-    # Variant 2: >>rus<< Сказать и сделать — не одно и то же.
-    # Variant 3: >>rus<< Сказать и сделать - не то же самое.
-    # Enter a phrase to translate or !q to quit.
-    # Enter a phrase: O jego propozycji nawet nie warto mówić.
-    # Variant 1: >>rus<< О его предложении не стоит даже говорить.
-    # Variant 1: >>rus<< О его предложении даже не стоит говорить.
-    # Variant 1: >>rus<< О его предложении и говорить не стоит.
-```
-### System Info:
-- hf_name: sla-sla
-- source_languages: sla
-- target_languages: sla
-- opus_readme_url: https://github.com/Helsinki-NLP/Tatoeba-Challenge/tree/master/models/sla-sla/README.md
-- original_repo: Tatoeba-Challenge
-- tags: ['translation']
-- languages: ['ru', 'pl']
-- src_constituents: { 'pol'}
-- tgt_constituents: {'rus'}
-- src_multilingual: True
-- tgt_multilingual: True
-- prepro:  normalization + SentencePiece (spm32k,spm32k)
-- url_model: https://object.pouta.csc.fi/Tatoeba-MT-models/sla-sla/opus-2020-07-27.zip
-- url_test_set: https://object.pouta.csc.fi/Tatoeba-MT-models/sla-sla/opus-2020-07-27.test.txt
-- src_alpha3: sla
-- tgt_alpha3: sla
-- short_pair: sla-sla
-- chrF2_score: 0.672
-- bleu: 48.5
-- brevity_penalty: 1.0
-- ref_len: 59320.0
-- src_name: Slavic languages
-- tgt_name: Slavic languages
-- train_date: 2020-07-27
-- src_alpha2: sla
-- tgt_alpha2: sla
-- prefer_old: False
-- long_pair: sla-sla
-- helsinki_git_sha: 480fcbe0ee1bf4774bcbe6226ad9f58e63f6c535
-- transformers_git_sha: 2207e5d8cb224e954a7cba69fa4ac2309e9ff30b
-- port_machine: brutasse
-- port_time: 2020-08-23-14:41
 ## Model Card Contact

 ---
 license: apache-2.0
+base_model: Helsinki-NLP/opus-mt-sla-sla
 pipeline_tag: translation
 language:
+  - pl
+  - ru
 tags:
+  - translation
+  - polish-to-russian
+  - slavic-languages
 ---
+# Model Card: 7-Sky/skyopus-pol-rus
+This model, `7-Sky/skyopus-pol-rus`, is a fine-tuned version of the `Helsinki-NLP/opus-mt-sla-sla` model, designed specifically for translating text from **Polish (pl)** to **Russian (ru)**. It is based on the Transformer architecture and uses normalization and SentencePiece tokenization (spm32k) for preprocessing.
+## Model Details
+- **Source Language**: Polish (`pol`)
+- **Target Language**: Russian (`rus`)
+- **Base Model**: [Helsinki-NLP/opus-mt-sla-sla](https://github.com/Helsinki-NLP/Tatoeba-Challenge/tree/master/models/sla-sla)
+- **Model Type**: Transformer
+- **Preprocessing**: Normalization + SentencePiece (spm32k, spm32k)
+- **Language Token**: Requires a sentence-initial token in the form `>>rus<<` to specify the target language.
+- **Training Date**: 2020-07-27
+This model is part of the broader `sla-sla` family, originally developed for translations between Slavic languages, but this variant is fine-tuned for the specific `pol -> rus` pair.
 ## Benchmarks
+- **chrF2 Score**: 0.672
+- **BLEU Score**: 48.5
+- **Brevity Penalty**: 1.0
+- **Reference Length**: 59,320 tokens
+These metrics reflect the model's performance on the Tatoeba-Challenge dataset for Slavic languages.
+## How to Use the Model
+Below is an example of how to use the model with the `transformers` library in Python. The code supports generating multiple translation variants using beam search.
 ```python
 from transformers import MarianMTModel, MarianTokenizer
+# Model name on Hugging Face Hub
+model_name = "7-Sky/skyopus-pol-rus"
+# Load the tokenizer and model
+tokenizer = MarianTokenizer.from_pretrained(model_name)
+model = MarianMTModel.from_pretrained(model_name)
+# Function to translate text from Polish to Russian
 def translate_text(source_text, num_translations=3):
+    # Add the required language token for Russian
     text_with_token = ">>rus<< " + source_text
     # Tokenize the input text
+    inputs = tokenizer(text_with_token, return_tensors="pt", padding=True)
     # Generate translations with multiple variants
     translated_tokens = model.generate(
         **inputs,
         num_return_sequences=num_translations,  # Number of translation variants
+        num_beams=num_translations,            # Use beams for diversity
+        max_length=512                         # Limit output length
     )
     # Decode the translated tokens into readable text
     return translations
 # Main loop for text input and translation output
+print("Enter a Polish phrase to translate into Russian or !q to quit.")
 while True:
     # Get input phrase from the user
         for idx, translation in enumerate(translations, 1):
             print(f"Variant {idx}: {translation}")
+# Example Output:
+# Enter a Polish phrase to translate into Russian or !q to quit.
+# Enter a phrase: Powiedzieć a zrobić to nie to samo.
+# Variant 1: Сказать и сделать — не одно и то же.
+# Variant 2: Сказать и сделать — это не одно и то же.
+# Variant 3: Сказать и сделать — не то же самое.
+#
+# Enter a phrase: O jego propozycji nawet nie warto mówić.
+# Variant 1: О его предложении даже не стоит говорить.
+# Variant 2: О его предложении не стоит даже говорить.
+# Variant 3: О его предложении и говорить не стоит.
 ## Model Card Contact