metadata
language:
- en
- ru
- zh
tags:
- sentence-transformers
- feature-extraction
- sentence-similarity
- text2text-generation
- t5
base_model:
- utrobinmv/t5_translate_en_ru_zh_base_200
license: apache-2.0
widget:
- example_title: translate zh-ru
text: |
translate to ru: 开发的目的是为用户提供个人同步翻译。
- example_title: translate ru-en
text: >
translate to en: Цель разработки — предоставить пользователям личного
синхронного переводчика.
- example_title: translate en-ru
text: >
translate to ru: The purpose of the development is to provide users with a
personal synchronized interpreter.
- example_title: translate en-zh
text: >
translate to zh: The purpose of the development is to provide users with a
personal synchronized interpreter.
- example_title: translate zh-en
text: |
translate to en: 开发的目的是为用户提供个人同步解释器。
- example_title: translate ru-zh
text: >
translate to zh: Цель разработки — предоставить пользователям личного
синхронного переводчика.
T5 English, Russian and Chinese multilingual machine translation
This is a sentence-transformers model: It maps sentences & paragraphs to a 768 dimensional dense vector space. The model works well for sentence similarity tasks, but doesn't perform that well for semantic search tasks.
The model uses only the encoder from a T5-base model.
Usage (Sentence-Transformers)
Using this model becomes easy when you have sentence-transformers installed:
pip install -U sentence-transformers
Then you can use the model like this:
from sentence_transformers import SentenceTransformer
sentences = ["This is an example sentence", "Each sentence is converted"]
model = SentenceTransformer('sentence-transformers/sentence-t5-base')
embeddings = model.encode(sentences)
print(embeddings)
Example translate Russian to Chinese
from transformers import T5ForConditionalGeneration, T5Tokenizer
device = 'cuda' #or 'cpu' for translate on cpu
model_name = 'utrobinmv/t5_translate_en_ru_zh_large_1024'
model = T5ForConditionalGeneration.from_pretrained(model_name)
model.to(device)
tokenizer = T5Tokenizer.from_pretrained(model_name)
prefix = 'translate to zh: '
src_text = prefix + "Съешь ещё этих мягких французских булок."
# translate Russian to Chinese
input_ids = tokenizer(src_text, return_tensors="pt")
generated_tokens = model.generate(**input_ids.to(device))
result = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
print(result)
# 再吃这些法国的甜蜜的面包。
and Example translate Chinese to Russian
from transformers import T5ForConditionalGeneration, T5Tokenizer
device = 'cuda' #or 'cpu' for translate on cpu
model_name = 'utrobinmv/t5_translate_en_ru_zh_large_1024'
model = T5ForConditionalGeneration.from_pretrained(model_name)
model.to(device)
tokenizer = T5Tokenizer.from_pretrained(model_name)
prefix = 'translate to ru: '
src_text = prefix + "再吃这些法国的甜蜜的面包。"
# translate Russian to Chinese
input_ids = tokenizer(src_text, return_tensors="pt")
generated_tokens = model.generate(**input_ids.to(device))
result = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
print(result)
# Съешьте этот сладкий хлеб из Франции.
Languages covered
Russian (ru_RU), Chinese (zh_CN), English (en_US)