utrobinmv's picture
add sentence
b070fd8
|
raw
history blame
3.7 kB
metadata
language:
  - en
  - ru
  - zh
tags:
  - sentence-transformers
  - feature-extraction
  - sentence-similarity
  - text2text-generation
  - t5
base_model:
  - utrobinmv/t5_translate_en_ru_zh_base_200
license: apache-2.0
widget:
  - example_title: translate zh-ru
    text: |
      translate to ru: 开发的目的是为用户提供个人同步翻译。
  - example_title: translate ru-en
    text: >
      translate to en: Цель разработки — предоставить пользователям личного
      синхронного переводчика.
  - example_title: translate en-ru
    text: >
      translate to ru: The purpose of the development is to provide users with a
      personal synchronized interpreter.
  - example_title: translate en-zh
    text: >
      translate to zh: The purpose of the development is to provide users with a
      personal synchronized interpreter.
  - example_title: translate zh-en
    text: |
      translate to en: 开发的目的是为用户提供个人同步解释器。
  - example_title: translate ru-zh
    text: >
      translate to zh: Цель разработки — предоставить пользователям личного
      синхронного переводчика.

T5 English, Russian and Chinese multilingual machine translation

This is a sentence-transformers model: It maps sentences & paragraphs to a 768 dimensional dense vector space. The model works well for sentence similarity tasks, but doesn't perform that well for semantic search tasks.

The model uses only the encoder from a T5-base model.

Usage (Sentence-Transformers)

Using this model becomes easy when you have sentence-transformers installed:

pip install -U sentence-transformers

Then you can use the model like this:

from sentence_transformers import SentenceTransformer
sentences = ["This is an example sentence", "Each sentence is converted"]
model = SentenceTransformer('sentence-transformers/sentence-t5-base')
embeddings = model.encode(sentences)
print(embeddings)

Example translate Russian to Chinese

from transformers import T5ForConditionalGeneration, T5Tokenizer

device = 'cuda' #or 'cpu' for translate on cpu

model_name = 'utrobinmv/t5_translate_en_ru_zh_large_1024'
model = T5ForConditionalGeneration.from_pretrained(model_name)
model.to(device)
tokenizer = T5Tokenizer.from_pretrained(model_name)

prefix = 'translate to zh: '
src_text = prefix + "Съешь ещё этих мягких французских булок."

# translate Russian to Chinese
input_ids = tokenizer(src_text, return_tensors="pt")

generated_tokens = model.generate(**input_ids.to(device))

result = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
print(result)
# 再吃这些法国的甜蜜的面包。

and Example translate Chinese to Russian

from transformers import T5ForConditionalGeneration, T5Tokenizer

device = 'cuda' #or 'cpu' for translate on cpu

model_name = 'utrobinmv/t5_translate_en_ru_zh_large_1024'
model = T5ForConditionalGeneration.from_pretrained(model_name)
model.to(device)
tokenizer = T5Tokenizer.from_pretrained(model_name)

prefix = 'translate to ru: '
src_text = prefix + "再吃这些法国的甜蜜的面包。"

# translate Russian to Chinese
input_ids = tokenizer(src_text, return_tensors="pt")

generated_tokens = model.generate(**input_ids.to(device))

result = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
print(result)
# Съешьте этот сладкий хлеб из Франции.

Languages covered

Russian (ru_RU), Chinese (zh_CN), English (en_US)